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This article presents the first comprehensive studies on the lo- 
cal and global inferences for the smoothing spline estimate in a uni- 
fied asymptotic framework. The novel functional Bahadur represen- 
tation is developed as the theoretical foundation of this article, and is 
also of independent interest. Based on that, we establish four inter- 
connected inference procedures: (i) Point-wise Confidence Interval; 
(ii) Local Likelihood Ratio Testing; (iii) Simultaneous Confidence 
Band (SCB); (iv) Global Likelihood Ratio Testing. In particular, our 
C.I. is proven to be asymptotically valid at any point over the sup- 
port, and is extraordinarily shorter than the classical Bayesian C.I. 
(Wahba, 1983). We also unveil new Wilk's phenomena arising from 
the local/global likelihood ratio testing, and further show that the 
global testing is more powerful/efficient than the local one in terms 
of the smaller minimum separation rate. It is also worthy noting that 
our SCB is the first one applicable to the general quasi-likelihood 
models. Furthermore, the inference opt imality /efficiency issues are 
carefully addressed. As a by-product of this article, we discover some 
surprising asymptotic equivalence phenomenon between the periodic 
and non-periodic smoothing splines in terms of inferences. 

1. Introduction. Smoothing spline models provide a very general framework for data analysis, 
modeling and learning in a variety of fields; see [57, 58, 21]. As far as we are aware, the existing 
literature are mostly concerned about the global convergence properties or methodological studies 
of smoothing spline estimate. Unfortunately, a systematic and rigorous theoretical study on their 
asymptotic inferences is almost nonexistent. This is partly due to the technical restrictions of 
the widely used equivalent kernel method. The novel Functional Bahadur Representation (FBR) 
we develop brings several major breakthroughs into the inference studies. The main purpose of 
this paper is to propose a series of local and global inference procedures for a univariate smooth 
curve based on FBR as the theoretical foundation. Moreover, we carefully investigate the inference 
optimality/emciency that has not been well treated in the smoothing spline literature. 

In this paper, we consider a general class of nonparametric regression models that covers the 
least square regression and logistic regression. The equivalent kernel method has long been used as 
a standard tool in dealing with the asymptotics of the smoothing splines, but it is only restricted to 
the simple least square regression; see [48, 38]. Moreover, this classical method only "approximates" 
the reproducing kernel function and the approximation formula becomes extremely complicated as 
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the smoothness of regression function increases and the design points are not uniform. To analyze 
the smoothing spline estimates in a more feasible way, we develop a novel technical tool via the 
empirical processes techniques, i.e., Functional Bahadur Representation, which directly deals with 
the "exact" reproducing kernel, and thus makes the systematic inference studies possible in the 
general nonparametric regression models. Several new theoretical insights are also obtained through 
its applications. An immediate consequence of FBR is the local behaviors of the smoothing splines, 
i.e., asymptotic normality, which naturally leads to our construction of approximate point-wise 
confidence intervals. The classical Bayesian C.I. in the literature ([56, 40]) is only valid in an average 
sense over the observed covariates, and may not be reliable if only evaluated at peaks or troughs as 
pointed out by Nychka (1988). However, our frequentist C.I. is proven to be theoretically valid at any 
point, and even possesses the surprisingly shorter length. We next introduce the local likelihood ratio 
method in testing the value of the regression function at any point of interest. It is shown that the 
null limit distribution is a scaled Chi-square distribution with degree of freedom one, and its scaling 
constant converges to one as the regression function becomes more and more smooth. Therefore, 
we have unveiled an interesting Wilk's phenomenon arising from this nonparametric local testing, 
which injects new theoretical insight into the literature. The very tricky testing sensitivity issue has 
also been studied by characterizing its power behaviors under a sequence of local alternatives. One 
relevant work is for the monotone function but with rather different null limit distribution; see [3]. 

In practice, the global inferences are arguably more useful. The simultaneous confidence band 
depicts the global behaviors of the regression function with sufficient accuracy, and its construction 
has been extensively studied in the literature. However, most of the efforts were devoted to the 
simple regression models with either symmetric errors, i.e., the volume of tube method ([51]), or 
additive Gaussian errors based on the kernel or local polynomial estimates, e.g., [22, 10, 17, 60]. By 
incorporating the approach of [5] into the Reproducing Kernel Hilbert Space (RKHS) framework, 
we are able to construct the first SCB applicable to the general nonparametric regression models, 
and prove its theoretical validity based on the strong approximation techniques. We further demon- 
strate that the minimum bandwidth order of our SCB has achieved the lower bound established 
in Genovese and Wasserman (2008). Model assessment forms another crucial component of global 
inferences; see [23]. Fan et al (2001) explored the use of local polynomial estimate in testing non- 
parametric regression models by the Generalized Likelihood Ratio Testing (GLRT). Based on the 
smoothing spline estimate, we propose an alternative method called as the Penalized Likelihood 
Ratio Testing (PLRT), and prove its null limit distribution as the nearly Chi-square with diverging 
degree of freedom. Therefore, the Wilk's phenomenon previously established for the local testing 
continues to hold for the global one but in a more nonparametric manner. Moreover, we demonstrate 
that the PLRT achieves the optimal minimax rate for the nonparametric hypothesis testing in the 
sense of Ingster (1993), and also discover that this global testing is more powerful/efficient than 
the local one in terms of the smaller minimum separation rate. Note that most other smoothing 
spline based tests, e.g., LMP and GML tests ([13, 57, 27, 8, 43]), either lead to complicated null 
distributions with nuisance parameters, or have not addressed the optimality issues. One major 
advantage of our PLRT over GLRT is that the specifications of the former null limit distribution 
are only determined by the parameter space, while the latter heavily depends on the choice of kernel 
function. In other words, our PLRT tests the nonparametric models in a more fundamental way. 

In the end, we would like to reiterate the highlights of this paper: 

(i) . Our asymptotic C.I. has the point-wise consistency and shorter length than the Bayesian C.I.; 

(ii) . Our SCB is the first one applicable to the general class of nonparametric regression models; 

(iii) . Our local and global likelihood ratio testing both yield the Wilk's phenomenon. More impor- 

tantly, we prove that the global testing is more sensitive/powerful than the local one in terms 
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of the smaller minimum separation rate. 

As an important by-product of this paper, we derive the asymptotic equivalence of inferences 
between the periodic and non-periodic smoothing splines under mild conditions; see Remark 5.2. In 
general, our discoveries reveal an intrinsic connection between two rather different basis structures, 
which in turn may be used to facilitate the practical implementations. The under-smoothing is 
usually needed in the nonparametric inferences, e.g., [39, 24, 25], and its amount has been precisely 
quantified for the inference procedures considered in this paper. We also give three general under- 
smoothing rules in Remarks 3.1 - 3.3. However, we also note that the under-smoothing is actually 
not needed in the global testing, i.e., PLRT. As it will be seen, the innovative FBR is an ideal 
theoretical tool for studying the above inference problems. 

Our paper is mainly devoted to theoretical studies, and leaves the more practical issues, e.g., 
tuning of the smoothing parameter, and the more challenging adaptive inferences as future topics. 
The general class of nonparametric models in consideration is fundamentally important so that 
the inferences on the more complicated models, e.g., multivariate extension, become conceptu- 
ally simple by applying similar likelihood based approach and FBR techniques. In particular, the 
semiparametric extension has been investigated in [9]. The rest is organized as follows. Section 2 
introduces the basic notations, model assumptions, and some preliminary RKHS results. Section 3 
presents the key technical tool of this paper, i.e., FBR, and the local asymptotics of the smoothing 
spline as its trivial application. In Sections 4 and 5, two local and two global inference procedures 
together with their theoretical properties are formally discussed, respectively. In Section 6, we give 
three concrete examples showing the validity of our theories. Numerical studies are also provided 
for both periodic and non-periodic splines. All the technical arguments are included in Appendix 
or Online Supplementary ([46]). 

2. Preliminaries. 

2.1. Notations and Assumptions. Suppose that the data T, = (Yi,Zi), i = l,...,n, are i.i.d. 
copies of T = (Y, Z), where Y G y C IR is the response variable, Z E I is the covariate variable and 
I = [0, 1]. Consider a general class of nonparametric models under the primary assumption that 

(2.1) hq(Z) = E{Y\Z) = F(g (Z)), 

where go(-) is some unknown smooth function and F(-) is some known link function. It covers 
two sub-classes of statistical interest. The first sub-class assumes that the data are modelled by 
Ui\zi ~ p(yi] iMj{zi)) for some conditional distribution p unknown upto the parameter /j,q. Instead 
of assuming the distributional knowledge, the second sub-class only specifies the moment relation 
in the sense that there exists some known positive function V(-) such that Var(Y\Z) = V([J,o(Z)). 
The nonparametric estimation of g in the latter is engaged by using the quasi-likelihood Q(y; (j,) = 
fy(y — s)/V(s)ds, where \i = F(g), as an objective function ([59]). Despite distinct modelling 
principles, these two sub-classes have a large overlap since Q(y; fj) coincides with several commonly 
used distributions logp(y; fj,) under various combinations of (F, V) as summarized in Table 1 below. 



p 


Normal 


Logistic 


Gamma(a, ft) 


Poisson Inverse Gaussian 


F(a) 
V(s) 


a 
1 


cxp(a) 
l+cxp(a) 

8(1-8) 


exp(a) 
a" 1 * 2 


exp(a) exp(a) 

8 ' S 3 



Table 1 

Five commonly used distributions together with their mean and variance functions. 
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From now on, we focus on the smooth criterion function £(y; a) : y X K i— )■ R, and allow it to cover 
the above two statistical classes, i.e., £(y;a) = Q(y;F(a)) or log p(y; F (a)). Denote the parameter 
space H as the m-th order Sobolev space: 

H m (I) = {g : K| is absolutely continuous for j = 0, 1, . . . , m - 1, and # (m) G L 2 (I)}, 

where m is assumed to be known and larger than 1/2. In some cases, % is also defined as a 
subclass of H m (I), i.e., homogeneous Sobolev space Hf^il), which has an additional restriction 
gV'(0) = 5^(1) for j = 0, 1, . . . , m — 1 (also known as the class of periodic functions). Let J(g,g) = 
J%9 { z )9 { z )dz. Consider the penalized nonparametric estimate g n , t \: 

(2.2) g n , x = argmax^A^) = argmaxi - V] £{Yi\ g{Zi)) - (X/2)J(g,g) \ , 

g&H g&H I n ^ I 

where J(^, g) is the roughness penalty of order m and A is the smoothing parameter converging 
to zero as n. We use A/2 (rather than A) here only for the simplicity of future expressions. The 
existence and uniqueness of g n ^\ are guaranteed by Theorem 2.9 of [21] when the null space M m = 
{g G H m (I) : J(g,g) = 0} is finite dimensional and l(y;a) is concave and continuous w.r.t. a. 

We next assume some basic model conditions. Let Iq be the range for go(z), which is obviously 
compact. Denote the first, second and third order derivatives of £{y; a) w.r.t. a by £ a (y, a), £ a (y', a) 
and £'"{y;a), respectively. We first assume the following smoothness and tail conditions on £: 

Assumption A.l. (a). £(y;a) is three times continuously differentiable and concave w.r.t a. 
There exists a bounded open interval I D Iq, and positive constants Cq and C\ s.t. 

(2.3) E {exp(sup \£ a (Y; a)\/C Q ) z) < C 1; a.s., 

I aeX J 

and 

(2.4) E {exp(sup \C'(Y; a)\/C ) Z ) < C x , a.s.. 

I ael ) 

(b) . There exists a positive constant C2 such that C^ 1 < I{Z) = —E(£ a (Y;go(Z))\Z) < C2 a.s.. 

(c) . e = £ a (Y;g {Z)) satisfies E(e\Z) = and E(e 2 \Z) = I(Z), a.s. 

Assumption A.l (a) implies the slow diverging rate, i.e., Op(log n), of maxi<j<„ sup ag j l^aC^i! a )V 
£'"(Yi;a)\. In the case that £{y;a) = logp(y;a), Assumption A.l (b) imposes the boundedness and 
positive definiteness of the Fisher information, and Assumption A.l (c) trivially holds if p satisfies 
some regularity conditions. However, when £(y; a) = Q(y; a), we have 

(2.5) £ a {Y-a) = F l (a) + eF 2 {a) and C (Y; a) = F^a) + eF 2 (a), 

where e = Y-fi (Z), F^a) = -\F(a)\ 2 /V(F(a))+(F(g (Z))-F(a))F 2 (a) andF 2 (a) = (F(a)V(F(a))- 
V(F(a))\F(a)\ 2 )/V 2 (F(a)). Hence, Assumption A.l (a) holds if Fj(a), Fj(a), j = 1,2, are all 
bounded over a £ I, and 

(2.6) E{ex.p(\e\/C )\Z} < d, a.s.. 

By (2.5), we have I(Z) = \F(g (Z))\ 2 /V(F(g (Z))). Thus, Assumption A.l (b) holds if 

(2.7) 1/C 2 < 'J^ 1 ' < C 2 for all a G 1 , a.s.. 

V(F(a)) 
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Assumption A.l (c) follows from the definition of V(-). Sub-exponential tail Condition (2.6) and 
boundedness Condition (2.7) are very mild quasi-likelihood model assumptions (also assumed in 
[37]). The assumption that Fj and Fj are both bounded over I could be restrictive without as- 
suming the estimation consistency. However, we can remove it in most models, e.g., binary logistic 
regression, by applying similar empirical processes arguments as in Section 7 of [37]. 

2.2. Reproducing Kernel Hilbert Space (RKHS). Some RKHS results are introduced into our 
general model framework as slight extensions of [12] and [41], e.g., an important Sobolev norm 
(2.8). It is well known that, when m > 1/2, T~L = H m (I) is a RKHS in which we endow the inner 
product and norm as, respectively, (g,g) = E{I(Z)g(Z)g(Z)} + XJ(g,g) and 

(2-8) |M| 2 = (<?><?>■ 

The reproducing kernel K(z±, z^) defined on I x I is known to have the following property: 

K z (-) = K(z, •) G H m {l) and (K z ,g) = g(z), for any z G I and g G H m (I). 

Obviously, K is symmetric with K{z\,Z2) = K(z2,z\). We further introduce a positive definite 
self-adjoint operator W\ : H m il) i-)- H m (T) such that 

(2-9) (W x g,g) = XJ(g,g), 

for any g,g G H m (I). Denote V(g,g) = E{I(Z)g(Z)g(Z)} . Hence, (g,g) = V(g,g) + (W x g,g), 
which implies V(g,g) = ((id — W\)g,g), where id denotes the identity operator. 

In the below, we assume that there exists a sequence of basis functions in the space H m (I), 
which can simultaneously diagonalize the bilinear forms V and J. Such an eigenvalue/eigenfunction 
assumption is typical in the smoothing spline literature, and is critical to control the local behaviors 
of our penalized estimates. Hereinafter, positive sequences a M and 6 M satisfying lim M _ >00 (a M /6 M ) = 
c > is denoted as >c b^. If c = 1, we denote a M ~ b^. Let denote the sum over v G N = 
{0, 1, 2, . . .} for convenience. Denote the sup-norm of g G H m (T) as ||g|| S up = sup 2gI |<7(#)|. 

Assumption A. 2. There exists a sequence of eigenf unctions h u G H m (T) satisfying sup [|^[|sup < 
00, and a nondecreasing sequence of eigenvalues j u ~ v 2m such that 

(2.10) V(h^h v ) = 5^ u , J(h^h v ) = j^S^, fi,v G N, 

where 5^ is the Kronecker's delta. Furthermore, for any g G H m (T), it admits the Fourier expansion 
g = ^2 u V(g,h u )h u with the convergence held under \\ ■ \\ -norm. 

Assumption A. 2 enables us to derive explicit expressions of \\g\\, K z (-) and W\h u (-) for any 
g G H m (I) and z G I; see Proposition 2.1 below. 

Proposition 2.1. For any g G H m (I) and z G I, we have \\g\\ 2 = Y^ v \V{g, K)\ 2 (l + Xj u ), 
K z{-) = £„ r^fiM-) and WxK(-) = j^rK(-) under Assumption A.2. 

For future theoretical derivations, it is crucially important to give sufficient conditions on As- 
sumption A. 2 in terms of the underlying eigensystem. When £(y; a) = —(y — a) 2 /2 and H = H™(T), 
Assumption A. 2 is known to satisfy if (7^, h u ) is chosen as the trigonometric basis (6.2) specified in 
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Example 6.1. However, in the more general £(y; a) with T~L = H m (T), we will show that Assumption 
A. 2 is still valid if (7^, h^'s are chosen as the (normalized) solutions of the problem 

(2.11) (-l) m ^ 2m )(.) = 7^(-M0M0. ^ } (0) = h®(l) = 0, j = m,m + 1, . . . , 2m - 1, 

where 7r(-) is the marginal density function of covariate Z. Our proof heavily relies on the ODE 
techniques developed in [6, 50]. 

Let C m (I) be the class of m-th order continuously differentiable functions over I. 

Proposition 2.2. If ir(z),I{z) G C ,2m_1 (I) and are both bounded away from zero and infinity 
over I, then the eigenvalues r y v s and corresponding normalized eigenf unctions h u s, i.e., V(h u ,h u ) = 
1, solved from (2.11) satisfy Assumption A. 2. 

Our Proposition 2.2 can be viewed as a nontrivial extension of Utreras (1988) in which I = tt = 1. 

In the end, we summarize the notations on Frechet derivatives to be used later. The Frechet 
derivatives of t n ^\ can be shown to be, for any Ag, Agj G H m (I) and j = 1, 2, 3, 

1 n 

D£ n>x {g)Ag = -V^^; g{Zi))(K Zi , Ag) - (W x g, Ag) 

i=l 

= (S n (g),Ag) - (W x g, Ag) = (S n , x (g),Ag) 
Note that S nj \(g ni x) = 0. In particular, S n ^\(go) is of interest, which can be expressed by 

(2.12) 5 n , A ( 5o ) = - V e l K Zi - W x9o . 

n 

i=l 

The Frechet derivative of S nj \ (DS n! \) is defined as DS nj \(g)Ag\Ag2 (D 2 S n) \(g)Ag\ A^A^), and 
can be written as D 2 t n , x {g)Ag l Ag 2 = n' 1 YT i= i 'L{Y i ;g{Z i ))(K Zv A gi ){K Z -, Ag 2 ) - (W x A gi ,Ag 2 ) 
(D 3 £ niX (g)A gi Ag 2 Ag 3 = n" 1 Ei=i ^{Y i ;g{Z i ))(K Zi ,Ag 1 )(K Zi ,Ag 2 )(K Zv Ag^)). 

Define S{g) = E{S n (g)}, S x (g) = S(g)-W x g and DS x (g) = DS(g)—W x , where DS(g)A gi Ag 2 = 
E{£ a (Y;g(Z))(K z ,A gi )(K z ,Ag 2 )}. According to the fact (DS x (g )f,g) = -(f,g), for any f,g£ 
H m (I), we have the following result: 

PROPOSITION 2.3. DS x (go) = —id, where recall that id is the identity operator on H m (I). 

3. Functional Bahadur Representation. In this section, we first develop the key technical 
tool of this paper: Functional Bahadur Representation, and then present the local asymptotics of 
the smoothing spline estimate as its straightforward application. In fact, FBR provides the rigorous 
theoretical foundation for the series of inference tools to be established in Sections 4 and 5. 

3.1. Functional Bahadur Representation. We first state the relationship between the || • || SU p- 
norm and || • ||-norm in Lemma 3.1 below, and then derive a concentration inequality in Lemma 3.2 
as the preliminary step in obtaining FBR. Denote h as A 1 /*- 2 " 1 ). 

Lemma 3.1. There exists a constant c m > s.t. \g(z)\ < c m /i~ 1//2 ||g|| for any z E 1 and g G 
H m (I). In particular, c m is not dependent on the choice of z and g. Hence, ||g|| S up < c m ^ _1 ^ 2 ||9||- 

Define 

G = {g(z) e H m (I) : \\g\\ mp < l,J(g,g) < c^hX' 1 }, 
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where the constant c m is specified in Lemma 3.1. Recall here that Tj = (Yl, Zi)s denote the full data 
variables with domain T ■ Our Lemma 3.2 below proves a concentration inequality on the empirical 
processes Z n (g) defined as, for any g 6 Q and z € I, 

1 n 

(3.1) Z n (g)(z) = ^yU n {T i] g)K z .{z)-E{^ n {T ] g)K z {z))\, 

where ip n (T;g) is a real- valued function (possibly depending on n) defined on T X Q. 

Lemma 3.2. Suppose that ip n satisfies the following Lipschitz continuity: 
(3-2) \MT;f)-MT;g)\ < c^h^Wf - g\\ sup for anyf,g G Q, 

where c m is specified in Lemma 3.1. Then we have 

Z n 



lim P I sup """^j' s < (51oglogn) 1/2 = 1. 

To obtain FBR, we need to further assume some proper convergence rate for g n; \: 
Assumption A.3. \\g n> x - go\\ = Op((n/i)~ 1 / 2 + h m ). 

A set of simple (but unnecessarily weakest) sufficient conditions for Assumption A. 3 is provided 
in Proposition 3.3 below. Before stating this result, we introduce another norm in the space H 
which is more commonly used in the functional analysis. For any g £ T~L, define 

(3-3) \\g\\ 2 n =E{I(Z)g(Z) 2 } + J(g,g). 

When A < 1, || • ||% is one type of Sobolev norm dominating || • || defined in (2.8). Denote 

(3.4) A* x n -z™/{2m+l)^ equiva i ently; h * x „-l/(2m+l)_ 

Note that A* is known as the optimal order of smoothing parameter when estimating go 6 H m (T). 

Proposition 3.3. Suppose that Assumption A.l holds, and further that \\g n \ — go\\n = °p(1)- 
If h satisfies (n 1 ^ 2 h)~ 1 (loglogn) m ^ 2m ^ 1 \logn) 2m ^ 2m ^ = o(l), then Assumption A. 3 is valid. In 
particular, g n \ achieves the optimal rate of convergence, i.e., 

Op( n - m /(2m+l)) j wnen A = A*. 

Now we are ready to present the key technical tool: Functional Bahadur Representation, which 
is also of independent interest. By incorporating A into the norm (2.8), we obtain a more powerful 
version of Shang (2010) that naturally applies to our general setting for inference purposes. 

Theorem 3.4. (Functional Bahadur Representation) Suppose that Assumptions A.l - A. 3 hold, 
h = o(l) and nh 2 — > oo are satisfied. Recall that S n! \(go) is defined in (2.12). Then we have 

(3.5) \\g n ^ -9o- S n> x(go)\\ = P {a n \ogn), 

where a n = n- 1 /2(( n / l )-i/2 + ^m^-(6m-l)/(4m)( loglogn )l/2 + Cihr^^nh)- 1 + h 2m )/\ogn and 
Ct = sup 2en £{sup aeX |C(^;a)l|^ = z}. Also, the RHS of (3.5) is o P {n- m ^ 2m+1 ^) when h = h* . 
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3.2. Local Asymptotic Behaviors. In this section, we obtain the point-wise asymptotics of g n ,\ as 
a direct application of FBR. The equivalent kernel idea may be used for deriving similar results but 
only restricted to the L2 regression, e.g., [48]. In contrast, our FBR-based proof applies to the more 
general regression and tackles the problems from a totally new perspective. Notably, our results 
reveal that some well known global convergence properties continue to hold in the local sense; see 
Remarks 3.1, and three types of under-smoothing conditions are summarized in Remarks 3.1 - 3.3. 

Theorem 3.5. (General Regression) Let the Assumptions A.l through A. 3 be satisfied. Suppose 
that h = o(l), nh? —?■ 00, and a n logn = o{n~ 1 / 2 ), where a n is defined in Theorem 3.4, as n 00. 
Furthermore, assume that, for any zq G I, 

(3.6) hV(K Z() ,K Zo ) u 2 ZQ as n -)■ 00. 

Denote g$ = {id — W\)go as the biased "true parameter" . Then we have 

(3-7) v^(5„,a(^o) - g*o(z )) A N(0,a 2 Z0 ), 

where 

,0 a \ 2 v h \ h u{zo)\ 2 

From Theorem 3.5, we immediately have the following result. 



Corollary 3.6. Suppose that Conditions in Theorem 3.5 hold, and 

(3.9) lim (nh^iWxgo^zo) = -b Zo , 

n— >oo 

then we have 

(3.10) Vn~h(g rii x(z ) - g (z )) — > N(b ZQ ,a 2 ZQ ), 

where a 2 is defined as in (3.8). 

We want to emphasize that our Theorem 3.5 covers a general class of nonparametric models under 
penalized estimation. To illustrate Corollary 3.6 in more details, we consider the L2-regression in 
which W\go(zo) (also b zo ) has an explicit expression under the additional boundary condition: 

(3.11) g { j) (0) = g®(l) = 0, for j = m, . . . , 2m- 1. 

Specifically, we consider two separate cases, i.e., b ZQ / and b ZQ = 0. Our results also apply to the 
boundary points after paying the price of boundary conditions (3.11). To gain more flexibility, we 
provide an alternative set of conditions to (3.11), i.e., (3.14), which can be implied by the so-called 
"exponential envelop condition" in [41]. 

Corollary 3.7. (L 2 Regression) Let m > (3 + \/5)/4 w 1.309 and £(y;a) = —{y - a) 2 /2. 
Suppose that Assumption A. 3 and (3.6) hold, and also the normalized eigenf unctions h u s satisfy 
(2.11). Assume that g G H 2m (I) and satisfies \V{gf m \h v )h v (z Q )\ < 00. 
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(%). Suppose that go satisfies the boundary conditions (3.11). If h/n 1 /( 4m + 1 ) — ). c > o, then we 
have, for any zq G [0, 1], 



(3.12) Vnh(g n ,x(zo) -g (z ))) A N ((-l) m " V^f^oV^o), a: 
If 7i x n~ d for some 4m 1 +1 < d < g^j; i^en we /lave, /or any zo G [0, 1], 

(3.13) V^h(g n , x (zo) ~ 9o(zo))) A iV(0, <). 

(n,). If we replace the boundary condition (3.11) by the following reproducing kernel conditions 
that, for any zq G (0, 1), as h — > 



= ^i), 



= o(l), forj = 0,...,m-l, 

2=1 



taen © hold for any z G (0, 1). 



In (3.12), we note that the asymptotic bias 6 2o is proportional to ^q^^o), an d the asymptotic 
variance o~ 2 Q can be expressed as a weighted sum of squares of the ( infinitely many ) basis functions 
K v (zq)s, i.e., (3.8). It is worthy pointing out that these observations are consistent with those in the 
polynomial spline setting that the former is proportional to g^ m \zo), and the latter is a weighted 
sum of squares of the (finitely many) normalized B-spline basis functions evaluated at zq; see [61]. 

Remark 3.1. The existing smoothing spline literature are mostly concerned about global conver- 
gence properties of the estimate. For example, Nychka (1995) (Rice and Rosenblatt (1983)) derived 
the global convergence rate in terms of the (integrated) mean squared error. However, we mainly 
focus on the local asymptotic behaviors here, and find that those well known global results actually 
hold in the local (point-wise) sense as well. Stone (1982) showed that, when go G H m (I), the op- 
timal convergence rate ofg n< x (in the global sense) is Op(n~ m /( 2m ' +1 )) . However, to achieve the 
above optimal rate, the order of X has to be chosen according to the degree of the regularization. 
Specifically, X needs to be chosen as h 2m X n~ 2m /( 2m + 1 ) under the m-th order Sobolev penalization. 
Under the setting of Corollary 3.7 where go G H 2m (I) and the m-th order penalty is used, our local 
result (3.12) shows that 5 n ,A(^o) has achieved the point-wise rate Op(n~ 2m ^ Am+1 ^), which turns 
out to be the optimal global rate, when A x n - 2m /( im + 1 ) . T further remove the asymptotic esti- 
mation bias, we have to sacrifice the convergence rate ofg nt \(zo) in (3.13) by choosing some faster 
convergent A. This further coincides with the under- smoothing procedure known in the literature. 

Remark 3.2. In practice, it might be more convenient to fix go G H m (T) and properly tune 
the smoothing parameter for removing the estimation bias. For example, in the general regres- 
sion, we can achieve this purpose by choosing some faster convergent A than the optimal X* x 
n -2m/(2m+i) ; Le > h * x rj -i/(2m+i)_ Specifically, we can choose h x n~ d with ^ < d < 

when m > 1 + y/S/2 « 1.866. It can be checked that the above h satisfies the conditions in Theo- 
rem 3.5. By reproducing kernel property and (2.9), \W\go(zo)\ = I(Wa<?o> K Zq )\ = A| J (go, K ZQ )\ = 
O(XJ(K ZQ ,K Z0 ) 1 / 2 ). By Proposition 2.1 and Lemma 2.2 of [12], J(K ZQ ,K Zo ) = l \i+x^? ~ 
h -(2m+i) ) which imp i ies W x go(z ) = 0(\h- m - 1 / 2 ) = 0(h m - 1 / 2 ) for any z G L Thus, VnhW x go(zo) 
0(n L l 2 h m } = o(l), i.e., b Zo = in (3.9), implied by the above range of h. 
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Remark 3.3. In fact, we can also remove the estimation bias while fixing H = H m (I) and 
employing A = A* by assuming Yl u \V (go, h^yiJ < oo, which might be "the weakest possible 
conditions" . Below are the explanations. By Proposition 2.1, we have 

\W x g Q (z Q )\ < sup UMsup • V \V(g , MIt^T" = \ 1/2 sup ||^|| sup • V \V(g , K)^ 2 ■ 



Under the above assumption and the inequality that \V(go, h u )\^ 2 ^ < \V(go, h u )\j^ 2 , domi- 
nated convergence theorem implies J2 U \V(9o> h v )\~h/ 2 = °(-Q> as X ^ 0, and thus VnhW\go(zo) 
o(\fnh\ 1 / 2 ) = o(l) when A = A*. Since V(g^ n \h I/ ) = (—l) m v m V(go,h u ), the above assumption 
holds when V(g^ n \ h u ) are absolutely summable, e.g., g$ G Lip a (T) with a S (1/2, 1], the Lipschitz 
functional class with index a; see the so-called Wiener algebra in [30]. 

4. Local Asymptotic Inferences. We consider inferring g(-) locally by constructing the 
point-wise C.I. in Section 4.1 and testing the local hypothesis via likelihood ratio in Section 4.2. In 
particular, the related inference optimality will also be discussed; see Remark 4.1 and Theorem 4.6. 

4.1. Point-wise Confidence Interval. We consider the confidence interval for some real- valued 
smooth function of go(zo) at any fixed zq 6 I, , denoted as po = p(go(zo)), e.g., po = F(go(zo)) = 
E(Y\Z = zo). An instance is po = exp(go(-2o))/(l + exp(go(.zo))) f° r the logistic regression model. 
Corollary 3.6 together with the Delta method immediately implies Proposition 4.1 on the point-wise 
C.I. in which the asymptotic estimation bias is assumed to be removed, e.g., by under-smoothing. 

Proposition 4.1. (Point-wise Confidence Interval) Suppose that Assumptions in Corollary 
3.6 hold and the estimation bias asymptotically vanishes, i.e., lim n ^ 00 (n/i) 1//2 (WAg , o)(- 2; o) = 0. // 
p'(g (z )) / 0, we have P (p G \p(g n ,x(zo)) ± $(a/2) ^ (g " ( ^ )fTz ° j ) — > 1 - a , where $(a) is the 



'nh 

lower a-th quantile of N(0, 1) and p(-) is the first derivative of p(). 

From now on, we focus on the point- wise C.I. for go(zo) an d discuss its optimality in the end. For 
simplicity, we consider the setting that £(y; a) = —(y — a) 2 / (2a 2 ), Z ~ Unif[0, 1] and H = H™^) 
under which Proposition 4.1 implies the following asymptotic 95% C.I. for go(zo): 

(4.1) 9n,x(zo) ± 1.96vjl 2 /(rmhfi), 

where h^ = ho x l m and I\ = J (1 + x 2m )~ l dx for I = 1,2. See Case (I) in Example 6.1 for the 
derivation of (4.1). When a is unknown, we may replace it by any consistent estimate. Under mild 
conditions, we further prove in Remark 5.2 that the same form of C.I. (4.1) also holds for the cubic 
spline, i.e., H = H 2 (T), although the center g n ,\(zo) is different. As far as we are aware, (4.1) is the 
first rigorously proven point- wise C.I. for the smoothing spline. However, the major contribution 
of this section is the surprising comparison between (4.1) and the classical Bayesian Confidence 
Interval proposed (studied) by Wahba (1983) (Nychka (1988)) even they are constructed based 
on different principles, i.e., frequentist v.s. Bayesian. Firstly, we would like to emphasize that the 
Bayesian C.I. is only shown to approximately achieve the 95% nominal level in an average sense. In 
other words, its average coverage probability over the observed covariates is not exactly 95% even 
asymptotically. Secondly, the Bayesian C.I. ignores the important issue of uniformity of coverage 
across the design space, and thus may not be reliable if only evaluated at peaks or troughs as 
pointed out in [40]. However, our asymptotic C.I. (4.1) is proven to be valid at any point. A more 
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striking fact is that (4.1) even possesses the shorter length than those of Wahba's and Nychka's 
Bayesian C.I.s at the same time. See the discussions below. 

For the purpose of comparison, we first derive the asymptotic equivalent versions of the Bayesian 
C.I.s. Wahba (1983) heuristically proposed the following Bayesian C.I.: 

(4.2) gn,x(z ) ± 1.96a y^t), 

where a{h}) = n" 1 (l + (1 + (vm/it))" 4 + 2]P™2~ 1 (1 + (2ir^))-^ . Under the assumption that 

h) = o(l) and (nh))^ 1 = o(l), Lemma 6.1 in Example 6.1 implies that %Y^v=i *(1 + {2irvh)))~ i ~ 
Ii/{ith)) = 4i2/(37r/jt) since I2/I1 = 3/4 when m = 2. The asymptotic equivalent version of 
Wahba's Bayesian C.I. is thus 

(4.3) 9n,x( z o) ± L96ff^/(4/3)-I 2 /(n7r/it). 
Nychka (1988) further shortened the Wahba's version (4.2) by proposing 

(4.4) 9n,\{zo) ± l.96y/Var(b(zo)) + Var(v(zo)), 

where b(z ) = E{g n ^{za)} - g (zo) and v(z ) = g n ,\{zo) - E{g niX {z )}, and also showed that 

(4.5) a 2 a[h^)/{Var{b{z )) + Var{v(z ))) -»• 32/27 as n^oo and Var(v(z )) = 8Var(b(z )); 
see his (2.3) and Appendix. Hence, we have 

(4.6) Var(v(z )) ~ a 2 ■ (I 2 /(n7r/t t )) and yar(6(z )) ~ (a 2 /8) • (^/(n^)). 
Therefore, Nychka's Bayesian C.I. (4.4) is asymptotically equivalent to 

(4.7) g n ,x(z ) ± 1.96(7^/(9/8) -h/irmh^). 

In view of (4.3) and (4.7), we have discovered that Wahba's Bayesian C.I. and Nychka's Byesian 
C.I. are asymptotically 15.4% and 6.1% wider than our C.I. (4.1), respectively. Similar conclusion 
holds for m > 2. Furthermore, the simulations performed in Example 6.1 empirically verify the 
superior performance of our C.I. in both periodic and non-periodic splines. Interestingly, we also 
realize that our frequentist C.I. (4.1) turns out to be the corrected version of Nychka's Bayesian 
C.I. (4.4) by removing its random bias term b(zo); see (4.6). The inclusion of b(zo) in Nychka's 
C.I. is problematic in the sense that: (i) it makes the point-wise limit distribution non-normal 
leading to the biased coverage probability; and (ii) it introduces additional variance unnecessarily 
enlarging the interval length. Therefore, by removing b(zo) from (4.7), we are able to achieve both 
the point- wise consistency and shorter interval in (4.1) without adding any computational burden. 

Remark 4.1. It follows from Cai and Low ( 2004 ) thut the lower bound on the length of the point- 
wise C.I. relies on the modulous of continuity over the parameter space. When the parameter space 
is H m (I), Donoho and Liu (1991) showed that the modulous of continuity is of order ra -" 1 /( 2m + 1 ). 
The length of our C.I. achieves this lower bound by adding a mild restriction on go £ H m (I), i.e., 
g^ has absolutely summable Fourier coefficients, and choosing A = A*; see Remark 3.3. 



12 



Z. SHANG AND G. CHENG 



4.2. Local Likelihood Ratio Test. In this section, we propose the likelihood ratio method in 
testing the value of go (zq) at any point of interest zq G I. We first show that the null limit distribution 
is a scaled non-central Chi-square distribution with degree of freedom one, whose specification is 
jointly determined by the reproducing kernel and estimation bias, and then establish the central Chi- 
square limit distribution after removing the estimation bias. We also note that, as the smoothness 
of the regression function increases, the scaling constant will eventually converge to one. Therefore, 
we have unveiled an interesting Wilk's phenomenon (meaning that the asymptotic null distribution 
is independent of any nuisance parameters) arising from this nonparametric local testing, which 
injects new theoretical insight into the literature. Hence, the inversion of likelihood ratio test can 
be conveniently used to constructing the point-wise C.I. for go(zo), and also F(go(zo)) due to the 
monotonicity of i ? (-); see Table 1. We further address the tricky testing efficiency/senstivity issue by 
studying its power behaviors under a sequence of local alternatives. The above issue is technically 
challenging since the testing sensitivity relies on the whole estimated curve even the test itself is 
local; see Theorem 4.6. An interesting testing sensitivity comparison will be made between the 
local LRT and its global counterpart in Section 5.2 in terms of their minimum separation rates. 
One related reference is Banerjee (2007) who considered similar test for the monotone functions, 
but his estimation method and null limit distribution are different from ours. 

For some pre-fixed point (zo,wo), we consider the following hypothesis: 

(4.8) H : g(zo) = w versus Hi : g(z ) / w Q . 

The "constrained" penalized log-likelihood is defined as L n% \(g) = re" 1 ^™ =1 £(1^; u>o + g{Zi)) — 
(X/2)J(g,g), where g G H = {g G H m {l) : g(z ) = 0}. We consider the LRT statistic defined as 

(4.9) LRT nyX = £ niX ( Wo + 5° A ) - l n> x(9n,x), 

where g® x is the MLE of g under the local restriction, i.e., g^ x = argmax ffg % L Ut \(g). 

Endowed with the norm associated with the inner product (•,•), Ho is a closed subset in H = 
H m (I), and thus a Hilbert space. Proposition 4.2 below says that it also inherits the reproducing 
kernel and the penalty operator from %. Its proof is trivial, and thus omitted. 

Proposition 4.2. (a). Recall that K(zi,Z2) is the reproducing kernel for H m (I) under (•,•). The 
bivariate function K*(zi, Z2) = K(zi, z?) — (K(zi, Zq)K(zq, Z2)) I K(zo, zq) is a reproducing kernel in 
(Ho, (•, •)). That is, for any z' El and g G Ho, we have K*, = K*(z' ', •) G Ho and (K*,,g) = g(z'). 
(b). The operator W x defined by W x g = W\g — (W\g)(zo)/ K(zq, zq) ■ K ZQ is bounded linear from 
Ho to Ho and satisfies {W x g,g) = XJ(g,g), for any g,g £H - 

Given Proposition 4.2, we are ready to derive the restricted FBR for g^ A that is used to obtaining 
the null limit distribution. We first define the Frechet derivatives of L n \ (under Ho) by modifying 
those of 4i,a as follows: replace g, Kz i and W\ by wo + g, K* z and W x , respectively. For example, 

n 

DL nA (g)Ag = n x ^4(^0 + g(Zi)){K* Zi , Ag) - (W* x g, Ag) 
i=l 

= (S° n (g),Ag) - (W* x g,Ag) = (5° )A ( 5 ), Ag). 

Similarly, we have 5° x (g° n>x ) = 0. Also define S°(g)Ag = E{(S° n (g), Ag)} and S° x (g)Ag = S°(g)Ag- 
(W x g, Ag) . As for the second derivatives, we have DS® x (g)Agi Ag2 = D 2 L ni \(g)AgiAg2 and 
DS° x (g)AgiAg2 = DS°(g)AgiAg2 - (W A *A 5l)ff2 ), where 

DS (g)AgiAg 2 = E{£ a (Y-w + g(Z))(K Zi ,Agi)(K Zi ,Ag 2 )}. 
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Similar as Theorem 3.4, we need an additional rate assumption for the restricted FBR result: 

Assumption A.4. Under H , \\g^x~9o\\ = Op((nhy 1/2 +h m ), where = (go(-)~ w o) G Ho- 

Assumption A. 4 is easy to verify by assuming (2.3), (2.4) and \\g^ \—gQ\\n = op(l). The argument 
is similar as the proof of Proposition 3.3 by replacing H with its subspace Hq. 

Theorem 4.3. (Restricted FBR) Suppose that Assumptions A.l, A. 2, A. 4, and Hq are satisfied. 
Ifh = o(l) andnh 2 -> oo, then ||^ A - g% - S° A (s8)|| = P (a n log n). 

Our main result on the local LRT is presented below. Define r n = (n/i)^ 1 / 2 + h m . 

Theorem 4.4. (Local Likelihood Ratio Testing) Suppose that Assumptions A.l through A. 4 are 
satisfied. Also assume that h = o(l), nh? — > oo, a n = o(min{r n , n _1 r~ 1 (logn) _1 , n~ 1//2 (logn) -1 }), 
and r^h' 1 / 2 = o(a n ). Furthermore, for any zq G [0, 1], if n 1/2 (W\g )(zo) / ^ K (zq, zq) -)■ -c Zo , 

(4.10) lim hV{K Z0 ,K Z0 ) -> a 2 ZQ > and lim E{I(Z)\K ZQ (Z)\ 2 } / K(z , z ) = cq G (0, 1], 
h— >0 A— >0 

t/ien we obtain: (i). \\g n ,X-9^ t x- w o\\ = Op^" 1 / 2 ); (ii). -2n-LRT n> \ = n\\g n ^-7p n X - w \\ 2 +o P (l); 

(4.11) (in). - 2n ■ LRT n>x A c xl(c 2 Z0 / c ) , 
with non-centrality parameter c 2 /co; under Hq. 

Note that the parametric convergence rate stated in (i) of Theorem 4.4 is reasonable since our 
restriction is local. By Proposition 2.1, it can be explicitly shown that 

(4.12) c °=hm§4r4> where QKA,.) = E7r#^ ^ = 1,2. 

The reproducing kernel K is uniquely determined for any Hilbert space if it exists; see [14]. So, 
Co defined in (4.10) is only determined by the parameter space. Hence, different choices of (^y u ,h u ) 
in (4.12) will give exactly the same value of cq although some particular choice will facilitate the 
computation of cq. For example, when T~L = H™^), we can explicitly calculate the value of cq as 
0.75 (0.83) when m = 2 (3) by choosing the trigonometric basis (6.2). Interestingly, in the more 
general H 2 (I), we can obtain the same value of Co even without specifying its (rather different) 
eigensystem under mild conditions; see Remark 5.2. However, the value of c ZQ in (4.11) partly 
depends on the asymptotic estimation bias (see (3.9)), whose estimation is notoriously difficult. 
Fortunately, under various under-smoothing conditions, we can show c ZQ = 0, and thus establish 
the central Chi-square limit distribution. For example, we can choose faster convergent smoothing 
parameter when fixing go G H m (I) as in Remark 3.2. Alternatively, we can also insist using A* but 
assume the parameter space with more smoothness; see Remark 3.1. In Corollary 4.5, we explore 
the latter approach in more details. 

Corollary 4.5. Suppose that Assumptions A.l through A. 4 are satisfied and Hq holds. Let 
m > 1 + \/3/2 1.866. Also assume that the Fourier coefficients {V(go, h u )} u ^ of go satisfy 
^2 u \V(go,h u )\ 2, y^ for some d > 1 + l/(2m) ; which is implied by go G H md (I). Furthermore, if 
(4-10) is satisfied for any zq G [0, 1], then (4-11) holds with limiting distribution cqXi given X = A*. 
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Corollary 4.5 discovers a nonparametric type of Wilk's phenomenon arising from the local hypoth- 
esis testing, which further converts into the classical one in the parametric setup as m — > oo since 
lim m _> 00 Co = 1. Our result delivers new theoretical insight into the nonparametric local hypothesis 
testing; see it global counterpart in Section 5.2. 

In the end, we discuss the efficiency /sensitivity of our local LRT by characterizing the limiting 
power under a sequence of local alternatives converging to the null. Let rj n be any positive sequence 
converging to zero. Consider the local alternative: H\ n : g = g n Q, where g n o = g* + rj n f n , and 
g*, f n £ H m (T) satisfying g*(zo) = wo, and as n — > oo, 

(4.13) \f n (z )\ -> oo, J(f n , f n ) < C^nA^)" 1 and n V 2 n V(f n , /„) -> r 2 2 Q , 

under g = g* for some constants C a > and r 2 . Under the above design of H\ n , we have g = g n o 6 
i? m (I) and g(zo) = ffno^o) 7^ wo (asymptotically), i.e., Ho does not hold. This well constructed 
sequence of local alternatives can be used to examine how much deviation from g n o( z o) to g*(zo) (or 
equivalently, wq) within H m (I) can trigger the rejection of Hq using LRT ri \. Theorem 4.6 explicitly 
says that H\ n can be detected when g n o(zo) and wo are separated by a distance converging to 
zero at some rate no faster than n ~ m /( 2m + 1 ) ) which is further proven to be a sharp bound. This 
minimum separation rate n - m /(' 2m + l ) [ s achieved under the smoothing parameter of the same order 
as the optimal one in the estimation, i.e., A = A*. We also note that the minimum separate rate 
coincides with the minimum length of the point-wise C.I. established in [36]; see Remark 4.1. 
As for the global likelihood ratio testing, Theorem 5.4 derives a faster minimal separation rate, 
i.e., ri - 2m /( 4m + 1 ) ) indicating that the global testing is actually more powerful/sensitive. The above 
surprising difference in the minimum separation rates turns out to be reasonable after a second 
thought. This is because, in the local testing, the data information is not as fully used as in the 
global one, which leads to a slower minimum separation rate as a compensation/tradeoff. 

Theorem 4.6. Let m > 1 + V3/2 rj 1.866, h X n~ d for < d < and 7] n > 

(n/i) -1 / 2 + h m . Assume that £(Y;g) is the log-density. Suppose, under both g = g* and g = g n o, 
Assumptions A.l through A. 4 are satisfied, e.g., Assumption A.l holds with go therein replaced by 

1/2 

g* and g n Q, respectively, and s jT JU \ V{g*, h v )\^J < C* for some positive constant C* unrelated to n, 
and (4-10) holds. Then for any 5 6 (0, 1), there exists a sufficiently large constant N such that 

(4.14) inf P (reject H \H ln is true) > 1 - 5. 

The lower bound of rj n , i.e., n - m /( 2m + 1 ) ) i s achieved when h = h* . If rj n = o(n~ m ^ 2m+1 ^), then 
we can find a sequence of functions f n satisfying (4-13) such that (4-14) does not hold. Thus, 
n -m/(2m+i) ^ s ^ e m i n i mum separation rate for the local LRT to detect H\ n . 

The log-density condition in Theorem 4.6 is only assumed for simplicity, and can be easily relaxed 
by assuming that Pg n0 is contiguous with respect to , where is denoted as the distribution 
function under the model parameter g. The above contiguity assumption can be verified using Le 
Cam's first lemma, i.e., Theorem 3.10.2 of [55]. We want to point out that the techniques in the 
proof of Theorem 4.6 are very generic and can be applied to derive the minimum separation rate 
in the local testing based on other test statistic, e.g., T n ^\ = Vnh(g n} \(zo) — wo)/a zo , which is 
essentially the same. 

5. Global Asymptotic Inferences. Depicting the global behavior of a smooth function is 
crucially important in practice. In Sections 5.1 and 5.2, we develop the global counterparts of 
Section 4 by constructing the simultaneous confidence band and testing global hypothesis via 
likelihood ratio. Again, the FBR is the key ingredient in the theoretical studies. 
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5.1. Simultaneous Confidence Band. In this section, we establish the simultaneous confidence 
band (SCB) for g(z) following the approach of Bickel and Rosenblatt (1973). The proposed SCB 
centers around g n ,\{ z ) with the ylogn- wider bandwidth than the asymptotic point-wise C.I., and 
is proven to be asymptotically valid over any compact subset in (0, 1) based on the FBR and 
strong approximation techniques. The approach of Bickel and Rosenblatt (1973) was originally 
developed in the density estimation context, and then has been extended to M-estimation ([22]) 
and local polynomial estimation ([10]). For example, SCB is constructed for (generalized) varying- 
coefficient models based on the latter method; see [17, 60]. The volume of tube method ([51]) 
is another approach, but requires the error distribution to be symmetric; see its application to 
[61, 32]. All the models considered above require the error to be additive and Gaussian. Sun, 
Loader and McCormick (2000) relaxed the restrictive error assumption of [51] in generalized linear 
models, but had to translate the nonpar ametric estimation into the parametric one. As far as we 
are aware, we construct the first SCB for the general class of nonparametric models including the 
logistic regression. In particular, the minimum bandwidth of our SCB is shown to achieve the lower 
bound established in Genovese and Wasserman (2008). In addition, the equivalent kernel conditions 
assumed in this section imply an interesting by-product that the asymptotic lengths of our point- 
wise C.I.s (also scaling constants in the null limit distribution (4.11)) based on the cubic spline and 
periodic spline are actually the same despite their different eigensystems; see Remark 5.2. 

One key set of conditions assumed in this section is the strong approximation conditions (5.1) - 
(5.3). Specifically, we assume that there exists a real function w(-) defined on 1R satisfying, for any 
fixed < (p < 1, W < z < 1 - W and t G I, 



(5.1) 



dP^ 
dtJ 



(h- l u({z - t) /h) - K(z,t)) 



< C K h-ti + V exp{-C 2 h- 1+ v) for j = 0, 1, 



where C2, Ck are some positive constants. Condition (5.1) implies that ui is an equivalent kernel of 
the reproducing kernel function K with certain degree of approximation accuracy. Meanwhile, we 
also require some regularity conditions on a;. In particular, we assume that 

(5.2) \uj{u)\ < C w exp(-|u|/C 3 ), \oo'{u)\ < C w exp(-|u|/C 3 ), for any u£l, 

and that there exists a constant < p < 2 s.t. 

/oo 
oj{t)u{t + z)dt = al- C p \z\ p + o(\z\ p ), as \z\ -> 00, 
-00 

where 0^,0^,0 ' p are some positive constants and of, = f R co(t) 2 dt. The following exponential en- 
velop condition is also needed 



(5.4) 



sup 

z,t& 



0_ 

Oz 



K(z,t) 



0(hr 



Theorem 5.1. (Simultaneous Confidence Band) Suppose Assumptions A.l through A. 3 are 
satisfied, and Z is uniform on I. Let m > (3 + \/5)/4 ~ 1.3091 and h = n~ s for any 5 G 
(0,2m/(8m — 1)). Furthermore, assume that there exist positive constants Co and C\ such that 
-E{exp(|e|/Ci)|Z} < Cq, a.s., and that (5.1) - (5.4) hold. The conditional density of e given Z = z, 
namely ir{e\z), is assumed to be satisfied for some positive constants p\ and pi, 



(5.5) 



d_ 

dz 



log7r(e|z) 



<pi(l + |e| P2 ) for any eGl and z El. 
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Then, we have, for any < ip < 1 and 

p[{25\ogn) l l 2 \ sup (nh) 1 / 2 a~ 1 I{zy 1 / 2 \g niX (z)-g (z) + (W x g ){z)\-d n \ <u) 

\ yhv<z<l-hf J J 

(5.6) — > exp(— 2 exp(— u)), 

where d n is some constant relying merely on h, p, ip and C p . 

The FBR developed in Section 3.1 and the strong approximation techniques ([5]) are crucial to 
the proof of Theorem 5.1. The uniform distribution condition on Z is only assumed for simplicity, 
and can be relaxed to the density that is bounded away from zero and infinity. Condition (5.5) is 
easy to check in various situations. For example, it holds for the conditional normal model, i.e., 
e\Z = z ~ N(0,a 2 (z)), if o~ 2 (z) satisfies inf z o~(z) 2 > 0, and o~(z) and cr'(z) both have finite upper 
bounds. The existence of the bias term W\go(z) in the SCB (5.6) may lead to poor small sample 
performances. We avoid the bias estimation by a slight under-smoothing which is also advocated 
by [39], following earlier results of [24, 25] where it is shown that under-smoothing is more efficient 
than explicit bias correction when the goal is to minimize the coverage error. Specifically, this bias 
effect will asymptotically disappear if we assume: 

(5.7) lim < sup \J nh log n\ W\go(z) | > = 0. 

n->oo y h *< z <i^ hv J 

Condition (5.7) is slightly stronger than the under-smoothing Condition that ^/nh(W\go)(zo) = o(l) 
assumed for the C.I. in Proposition 4.1. Due to the uniform boundedness of h u J s in Assumption A. 2 
and the generalized Fourier expansion of W\go, it is easy to show that (5.7) is satisfied if we (i) 
increase the smoothness of go; (ii) choose some suboptimal smoothing parameter; or (iii) assume 
slightly stronger conditions on see Remarks 3.1 - 3.3. 

Proposition 5.2 reveals the validity of Conditions (5.1) - (5.3) in the setting of L 2 regression. 
The proof relies on the explicit construction of an equivalent kernel for various m in [38] . Here we 
only consider m = 2 for simplicity. 

Proposition 5.2. (L 2 regression) Consider the setting that £(y,a) = —{y — a) 2 /{2a 2 ), Z ~ 
Unif [0,1] and U = H 2 (I), i.e., m = 2. Then, (5.1)-(5.3) hold with u{t) = a 2 - 1 / m u) (o-- 1 / m t) 
fort £ R, where uio(t) = exp(— \t\/\/2) (cos(i/\/2) + s\n(\t\ / \/2)) . In particular, (5.3) holds for 
arbitrary p G (0, 2] and C p = 0. 

Remark 5.1. In the setting of Proposition 5.2, we are able to explicitly find the constants 
a 2 and d n in Theorem 5.1. Specifically, it is trivial to calculate that a 2 = 0. 265165a 7 / 2 since 
a 2 = \co (t)\ 2 dt = 0.265165 and m = 2. Since C p = for arbitrary p € (0,2], by the formula 
B{t) in Theorem Al of [5], we know that 

,, 8 ) 

When p = 2, the above d n is simplified as (21og(/i _1 — 2/i'< 3 ~ 1 )) 1 / 2 . In general, we know that d n ~ 
(— 2 log K) 1 ! 2 x \J\ogn for sufficiently large n since h = n~ s . Given that the estimation bias is 
removed, e.g., under (5.7), we have the following 100 x (1 — a)% SCB: 

(5.9) { g n ,x{z) ± 0.5149418(n/i)~ 1/2 CT 3/4 (c*J y/-2logh + d„) : < z < 1 - h?} , 
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where d n is given in (5.8), c* = — log(— log(l — a)/2) anda is a consistent estimate. Note that we 
exclude the boundary points in (5.9). To obtain the uniform coverage, we have to sacrifice a bit by 
increasing the bandwidth upto y/log n- order over the length of the point-wise C.I., e.g., (4-1)- 

Remark 5.2. One interesting by-product we discover in the setting of Proposition 5.2 is that 
the point-wise C.I.s for go(zo) based on the cubic spline and periodic spline share the asymptotic 
equivalent length at any fixed zq E (0, 1). This result is a bit surprising since these two splines have 
very distinct eigensystem. Under (5.1), it can be shown that 

f 1 







< n ~ o~~ h \ \K(z ,z)\ z dz 
-l 
10 



o~ 2 h~ l 



a- 2 



(i-zo)M 

■zo/h 



z - z 



2 

d:Z 



h 

u{s)\ 2 ds~o- 2 I \u{s)\ 2 ds = a^ 2 a 2 



given the choice of uj in Proposition 5.2. Thus, Corollary 3.6 implies the following 95% C.I. 

(5.10) 9n,x{z ) ± 1.96(n/i)- 1 /V 3 /V W0 = g n ,x{z ) ± 1. 96^)-^ aa^. 

Since a 2 = Iz/tt, the lengths of C.I.s (4-1) (based on periodic spline) and (5.10) (based on cubic 
spline) surprisingly coincide with each other. Another useful application of Proposition 5.2 is to 
find the value of cq needed in the local LRT test when 7~L = H 2 (T); see Theorem 4-3. According to 
the definition of cq in (4-10), we have c$ ~ a 2 /(hK(zo, zq)). Under (5.1), we can show K(zq, zq) ~ 
^-^(O) = fr-V^woCO) = 0.3535534/i- 1 cr 3 / 2 . Since a 2 ~ cr 3 / 2 a 2 and a^ = I 2 /tt, we have c = 

0. 75. This value coincides with the one found in periodic splines, i.e., % = Hq(1). These somewhat 
amazing phenomena have never been observed in the literature and may be used to facilitate the 
construction of C.I. and local LRT in practice. 

Remark 5.3. Genovese and Wasserman (2008) showed that when go belongs to a m-order 
Sobolev ball, the lower bound for the average length of the SCB is proportional to 6 n n~ m ^ 2m+1 - ) 
with b n merely depending on logn. We next show that the (minimum) bandwidth of our SCB can 
achieve this lower bound with b n = (logn)( m+1 )/( 2m+1 ). Based on Theorem 5.1, the bandwidth of our 
SCB has the shrinking rate d n {nh)~ 1 ' 2 , where d n is of the order yTogn; see Remark 5.1. Meanwhile, 
Condition (5.7) is crucial for our band to maintain the desired coverage probability. Suppose that 
the Fourier coefficients of go satisfy the condition in Remark 3.3. It can be verified that (5.7) holds 
when nh 2m+1 logn = O(l) which sets an upper bound for h. When h is chosen as the above upper 
bound, i.e., 0{n\ogn)~ 1 ^ 2m+l \ and d n x yTogn, our SCB has achieved its minimum bandwidth, 

1. e., n _m /( 2m+1 )(logn)( m+1 )/ ( ' 2m+1 ) ; which turns out to be rate optimal according to [20]. 

In practice, the construction of our SCB requires a delicate choice of (h, (p). Otherwise, over/under- 
coverage of the true function may occur near the boundary points. Unfortunately, as pointed by 
[5], there is no practical/theoretical guideline on how to find the optimal {h,ip) } although one can 
choose proper h to make the band as thin as possible. Hence, in next section, we propose a more 
practically feasible approach to explore the global behaviors, which only requires the tuning of h. 
Moreover, we are able to specify an optimal h under which our likelihood-ratio-based approach 
achieves the optimal minimax rate of hypothesis testing specified by Ingster (1993). 
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5.2. Global Likelihood Ratio Test. Nonparametric hypothesis testing is of equal importance in 
studying the global behaviors; see an overview and references in [23]. There is a vast literature deal- 
ing with this problem among which the Generalized Likelihood Ratio Testing (GLRT) ([18]) arises 
as a fundamental approach. Due to the technical tractability, Fan et al (2001) only focused on the 
local polynomial fitting in the GLRT; also see [19] for the sieve extension. Based on the smoothing 
spline estimate, we propose an alternative method called as the Penalized Likelihood Ratio Test- 
ing (PLRT), which not only applies to the simple hypothesis but also to a very general class of 
composite hypothesis; see Remark 5.4. The null limit distribution is proven to be nearly \ 2 with 
diverging degree of freedom. Therefore, the Wilk's phenomenon observed in local LRT continues to 
hold in nonparametric penalized likelihood but with a more nonparametric form. Besides the much 
more concise assumptions, one major advantage of our PLRT over GLRT is that the specifications 
of the former null limit distribution are only determined by the parameter space, while the latter 
heavily depends on the choice of kernel function; see Table 2 in [18]. In other words, the PLRT is 
closer to the nature of nonparametric models. Furthermore, we show that the PLRT achieves the 
optimal minimax rate for hypothesis testing in the sense of Ingster (1993). In practice, the power 
performances of PLRT are superior and better than those of GLRT for small sample sizes in both 
periodic and non-periodic splines; see Example 6.1. In summary, our PLRT is not only intuitive 
to use but also powerful to apply. However, most other smoothing spline based tests, e.g., LMP 
and GML tests ([13, 57, 27, 8, 43]), use ad-hoc discrepancy measure leading to complicated null 
distributions with nuisance parameters, and have not addressed the optimality issues at all. Hence, 
their applicability is restricted; see more review in [34]. 

Consider the following "global" hypothesis: 

(5.11) H giobai . g = go versus jjgobaX . g £ ^ _ 

where go S T~L can be either known or unknown. The PLRT statistic is defined as 

(5.12) PLRT n , x = i n ,x(go) ~ L,x(9n,x). 

Even when go is unknown, the limit distribution of PLRT under }j^ ohal can still be derived, though 
the value of test statistic is not calculable. More importantly, this nice property can be used to test 
composite hypothesis; see Remark 5.4. 

Theorem 5.3 below derives the null limiting distribution of PLRT n \ based on the FBR result. 

Theorem 5.3. Let the Assumptions A.l through A. 3 be satisfied. Also assume that nh 2m+l = 
0(1), nh 2 — > oo, a n = o(min{r n , n~ 1 r r ^ 1 /i _1 / 2 (logn) _1 , n _1/,2 (logn) -1 }) and r^h^ 1 / 2 = o{a n ). 
Furthermore, under H ^ obal > E{e 4 '\Z} < C, a.s., for some constant C > 0, where e = £ a (Y; go(Z)) 
represents the "model error". Under jj9 lobal ; we h, ave 

(5.13) (2u n )- 1 / 2 (-2nr K ■ PLRT n , x - nr K \\ W x g \\ 2 - u n ) A N{0, 1), 
where u n = h^a^j p\, r K = o 2 K l p\, 

h 



(5.14) a 2 K = hE{e 2 K(Z, Z)} = £ (1 + A>) ,/4 = hE{e 2 e 2 K{Z u Z 2 ) 2 } = E (T+ A- ) -' ' 

and (ej, Zj), i = 1, 2 are iid copies of (e, Z). 

Direct examination reveals that h >c n~ d with 2 m+i — d < g^^T sa tisfies the rate conditions 
required in the above Theorem when m > (3 + \/5)/4 ~ 1.309. In Theorem 5.4, we further show 
that some particular choice of h in the above range will guarantee the minimax optimality of PLRT. 
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Theorem 5.3 implies that — 2nrx ■ PLRT n \ is asymptotically N(u n ,2u n ) since n||WA<7o|| = 
o(h ) = o(u n ) implied by the proof of Theorem 5.3 and the definition of u n . As n approaches 
oo, i.e., u n — > oo, we know that N(u n ,2u n ) is nearly the same as Xu n i n distribution. Hence, 
—2nrK • PLRT n \ is approximately distributed as x\ n i denoted as 

(5.15) -2nr K -PLRT n!X ~xl n - 

Therefore, we claim that the fundamental Wilk's phenomenon also holds under nonpar ametric 
penalized estimation but with a more nonparametric form, i.e., the diverging degree of freedom. 
Obviously, the specifications of (5.15), i.e., a 2 K and p\, are only determined by the parameter 
space and model setup. This is in stark contrast with the null limit distribution of GLRT whose 
specifications vary with the used kernel functions; see Table 2 of [18]. Unfortunately, there is no 
theoretical guideline in choosing the most suitable kernel function. Hence, our PLRT tests the 
nonparametric models in a more fundamental way. In addition, we find that the under-smoothing is 
not needed in carrying out the valid global testing, i.e., (5.15), unlike the other inference procedures. 

We next discuss the calculation of (rK,u n ) and its implications in some important setup. In the 
setting of Proposition 5.2, we can show a 2 K = ho~ 2 K(z, z)dz ~ /kt~ 2 (/i~ 1 o;(0)) = cr _1 / 2 a;o(0) = 
0.3535534<7 -1 / 2 by applying this Proposition. Similarly, we have p 2 K ~ (J ^ 1 ^ 2(J t ]0 = 0.265165er~ 1 / 2 . 
So rx = 1.3333 and u n = 0.4714/i _1 o" -1 / 2 . Surprisingly, if we replace H 2 (I) by Hq(I) in the above 
setup, our direct calculations in Case (I) of Example 6.1 reveal that (rx,u n ) share exactly the 
same values. We also note that rx — > 1 when % = H™(V) as the degree of smoothness m tends 
to oo. This is consistent with the scaling constant 2 in the classical likelihood ratio theory. Note 
that the possibly unknown parameter a in u n can be essentially profiled out without affecting 
the null limit distribution. We keep it here only for the consistency with our general modeling 
framework. Alternatively, we can directly simulate the null limit distribution by fixing the nuisance 
parameters, e.g., the null value go, at reasonable values or estimates (e.g., by wild bootstrap) even 
without calculating the values of (rx, u n ). This is one major advantage of the Wilk's type of results. 

Remark 5.4. In this Remark, we will discuss the composite hypothesis testing via PLRT and 
the related Wilk's phenomenon. Specifically, we are able to test whether g belongs to some finite 
dimensional class of functions with bounded Sobolev norm, which is much larger than the null space 
M m considered in the literature. As an example, we consider testing, for any integer q > 0, 

(5.16) H 9 lobal : g G C q (T) 

where C q (T) = {g{z) = J2l=o a i zl '■ a = ( a 0) a i> ■ • • > a q) T £ represents the class of q-th 

polynomials over L Let a* = argmax agR9 +i{(l/n) J27=i K^il YH=o a i^i) ~ (A/2)a T Da} , where 
D = f*(Q, 0, 2, 6z, . . . , q(q - l)z^ 2 ) T (0, 0, 2, 6z, . . . , q(q - l)zi- 2 )dz is a (q + 1) x (q + 1) ma- 
trix. Hence, under }{9 lobal ; ^ e penalized MLE g*(z) = YH=o^*i zl ■ Let go q denote some unknown 
"true" parameter in C q (I) with some polynomial coefficient a = (af],^, . . . ,a^) T . For testing the 
composite hypothesis (5.16), we first decompose the PLRT statistic PLRT£°™ as L„,i — L n 2, where 
L n i = in,\{90q) ~ ^n,x(9n,x) and L n2 = £ n ,\(goq) - 4,,a(?*)- By formulating 

Hq : a = aP versus : a / a , 

we notice that L n 2 appears to be the PLRT in the parametric setup. We can prove the order of L n i 
as Op(n _1 ) no matter q < m (by applying the parametric theory in [47]) or q > m (by slightly 
modifying the proof of Theorem 4-4)- On the other hand, L n \ is exactly the PLRT for testing 



H' : g = go q versus Hf oa : g ^ g 0q . 
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Since Theorem 5.3 also applies to the unknown null value go q , L n \ follows the limit distribution 
(5.15). So does PLRTf^ 1 under the composite hypothesis (5.16) considering L n 2 = Op(n _1 ). 

To the end of this section, we remark that PLRT achieves the optimal minimax rate of hypothesis 
testing specified in Ingster (1993). By developing the uniform version of FBR, we rigorously prove 
the above claim in Theorem 5.4. For convenience, we only consider £(Y;a) = — (Y - a) 2 /2. The 
extension to the more general setup can be found in [46] under stronger assumptions, e.g., more 
restrictive Q a defined below. Write the local alternative as H\ n : g = g n o, where g n o = go + 
g n , go G H m (T) and g n belongs to some alternative value set Q a = {g G H m (T)\Var(g(Z) 2 ) < 
(E 2 {g(Z) 2 }, J(g,g) < (} for some constant ( > 0. 

Theorem 5.4. Let m > (3 + \/5)/4 « 1.309, and h X n~ d for < d < %pp^±. Suppose 

that Assumption A. 2 is satisfied, and uniformly over g n G Q a , \\g n A ~~ 9no\\ = Op(r n ) holds under 
H\ n '■ g = g n o- Then for any 5 G (0, 1), there exist positive constants C and N such that 

(5.17) inf inf P [reject HQ lobal \Hi n is true) > 1 — 8, 

\\g n \\>Crin 

where r] n > \fh 2m + (n/i 1 / 2 ) -1 . The minimal lower bound ofrj n , i.e., ^ _2m /( 4m + 1 ) ; i s achieved when 
h = h** = n - 2 /( 4m+1 ). 

The condition "uniformly over g n G G a , \\g n ,x -Jno\\ =^0 P (r n ) holds under H ln : g = g n0 " 
means that for any 5 > 0, there exist constants C and N both unrelated to g n G Q a such that 

inf n>7V illf 3n6Ga P g n0 [\\9n,X ~ 9no\\ < Cr n ^j >l-5. 

Theorem 5.4 proves that, when h = h**, PLRT can detect any local alternatives with a separation 
rate no faster than n - 2m /( 4m + 1 ) ) which turns out to be the optimal minimax rate in the sense of [26]; 
see more discussions in Remark 5.5. The above rates are consistent with those derived in the local 
polynomial estimation ([18]) although our nonparametric models are more general and conditions 
in Theorem 5.4 are more concise. In contrast with the local LRT studied in Theorem 4.6, we note an 
interesting fact that two different smoothing parameters are employed for obtaining the minimum 
separation rates, i.e., A = A* = n -2m/(2m+i) f or ^ i oca j testing and A = A** = n - 4m /( 4m +!) f or the 
global testing. Such a distinction might be caused by the different nature of these two testing, i.e., 
local v.s. global, which is reflected by their different minimum separation rates; see the discussions 
right below Theorem 4.6. In Example 6.1, a simulation study was conducted to compare the powers 
of PLRT and GLRT for both periodic and non-periodic splines; see Tables 3 Sz 4. As n grows, we 
find that the powers in both tests rapidly approach to one, and, more interestingly, that PLRT 
appears to be more powerful in the small sample size such as n = 20. 

Remark 5.5. We note that the optimal minimax rate of testing established in Ingster (1993) 
is under the usual \\ ■ \\i 2 -norm (w.r.t. Lebesgue measure). However, our minimum separation rate 
derived under \\ ■ \\ -norm is still optimal due to the trivial domination of \\ ■ \\ over || • ||l 2 (under con- 
ditions of Theorem 5.4). We next heuristically explain why the minimax rates of testing associated 
with || • || ; denoted as b' n , and with \\ ■ \\i 2 , denoted as b n , are essentially the same under conditions 
of Theorem 5.4, which may not be easy to see. By definition, whenever \\g n \\ > b' n or \\g n \\L 2 ^ b n , 
^global can ^ e re j ec t e & yjiffo l ar g e probability, or equivalently, the local alternatives can be detected. 
Note that b' n and b n are the minimum rates that satisfy this property. Ingster (1993) has shown 
that b n x n -2m/( 4m +i). Since ||gn||L 2 ^ Ki implies \\g n \\ > b' n , Hq ° a is rejected. This means b' n is 
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an upper bound for detecting local alternatives in terms of \\ ■ \\l 2 , and so b n < b' n . On the other 
hand, suppose h = h** x n~ 2 /( 4m+1 ) and \\g n \\ > Cn~ 2m ^ Am+1 ^ x b n for some large C > C 1 / 2 . 
Since XJ{g n ,g n ) < CA ~ ( n - 4m /( 4m+1 ), it follows that \\g n \\L 2 > (C 2 - C) 1/2 n" 2m /( 4m+1 ) x b n . This 
means b n is a upper bound for detecting the local alternatives in terms of \\ ■ \\, and so b' n < b n . 
Therefore, b' n and b n are of the same order. 

6. Examples. This section illustrates the applicability of our theories with three examples, 
and demonstrates the empirical performances of our inference procedures via some simulations. 

Example 6.1. (L2 Regression) Consider the nonparametric regression model 

(6.1) Y = g (Z) + e, 

where e ~ -/V(0, a 2 ) with unknown a 2 . Hence, we have I(Z) = o~~ 2 and V(g,g) = a~ 2 E{g(Z)g(Z)}. 
For simplicity, we assume that the true value of a is one and Z is uniformly distributed over I. In 
the simulations, the unknown a can be either consistently estimated or profiled out as in [18]. The 
function "ssr()" in R package assist was used to select the smoothing parameter A, i.e., h, based on 
CV or GCV; see [58]. Note that, in the simulations, we implicitly perform the under-smoothing using 
the GCV-selected smoothing parameter since the employed test function is sufficiently smooth; see 
Remark 3.1. We first consider U = H^{1) in Case (I), and then U = H m (I) in Case (II). 
Case (I). % = Hq 1 ^): In this case, we can choose the basis functions /i^'s as 



(6.2) h^z) 



a, n = 0, 

\[2o cos(27rA;2!), /a = 2k, k = 1, 2, . . . , 
\[2o sin(27rfcz), fi = 2k — 1, k = 1, 2, . . . , 



with the eigenvalues 72^-1 = 72fc = o- 2 (2irk) 2m for k > 1 and 70 = 0. Assumption A. 2 is trivially 
satisfied for the above choice of (/i^^^J's. We first prove a useful Lemma below. 

Lemma 6.1. Recall that I { = J °°(l + x 2m )- l dx for 1 = 1,2 and h) = ha x l m . Then, we have 

1 /, 



hV 



( 6 ' 3 ^ 2 (1 + (2Trtfk) 2 ™) 1 " 2tt 

Proposition 4.1 implies the asymptotic 95% point-wise C.I. for g(zo) as g^ n; \(zo)±l-9()o- Zl J\/nJi by 
choosing proper h; see (3.13). To obtain an explicit form of o~ 2 Q , which is the limit of hV(K Z0 , K zo ) 
as /t 0, we note that hV{K Zo ,K Zo ) = a 2 h (l + 2££1 1 (1 + (2vr/itfc) 2m )- 2 ) ~ (ha 2 - 1 /™)/-* based 
on Lemma 6.1. Hence, in practice, we use 

(6.4) g n>x (z ) ± Lge^-V^VVfrn/i), 

where a 2 = n^ 1 YliO^i — 9n,\(Zi)) 2 ■ Alternatively, according to Theorem 4.4, we can also establish 
the asymptotic C.I. by inverting the local likelihood ratio. The above trigonometric basis (6.2) gives 

n(\*\ J 1 y f IM^o)| 2 |^2fc-i(^o)| 2 1 

W ' 0> ^ ^ i \(l + Xa 2 (27rk) 2m Y (l + Xa 2 {27rk) 2m yS 

2 o 2 ^ ^ 2 o 2 ^ ^ 

a + a a + \o 2 {2-Kk) 2m ) 1 = a + a ^ (1 + (2vWk) 2m ) v 



k>l 
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Combining (4.12) with Lemma 6.1, we have c$ = I%JI\. Hence, cq = 0.75 (0.83) when m = 2 (3). 

In Table 2 below, we compare the coverage probability (CP) between our asymptotic C.I. (6.4), 
denoted as ACI, and Nychka's Bayesian C.I. (4.7), denoted as NCI, at three quartiles (Qi,Q2,Qs) 
of the observed covariates Z. We assume the true periodic function go(z) = 12sin(7rz) and estimate 
it using periodic spline under m = 2. The CP was computed as the proportion of the C.I.s that 
cover go at that point over 10, 000 replications. From Table 2, it is observed that the CPs of ACIs 
and NCIs are both reasonably close to the 95% nominal level. However, as n grows, the CPs of the 
ACIs are getting closer to 95% while those of NCIs always stay a bit above 95% with the increasing 
gap, in particular when n = 800. This somewhat unsatisfactory performance of NCI is consistent 
with the observations in [40]. Except for better CP, our ACI also has shorter length; see Table 2. 
Our simulation results empirically verify our claim in Section 4.1 that the Bayesian C.I. has biased 
coverage probability and larger interval length. 



Qi Qi Qi 



n 


NCI 


ACI 


NCI 


ACI 


NCI 


ACI 


100 


95.12 


93.74 


95.43 


94.17 


95.33 


93.99 


200 


95.94 


94.64 


95.75 


94.47 


95.79 


94.51 


300 


95.81 


94.60 


95.97 


94.74 


95.92 


94.62 


400 


95.93 


94.60 


96.03 


94.90 


95.92 


94.60 


800 


96.20 


94.75 


96.14 


94.94 


96.34 


95.15 



Table 2 

Comparison of lOOx CP% of CIs in Case (I). The lengths of the NCI are 1.14,0.88,0.75,0.68,0.52, and those of 
ACIs are 1.08,0.83,0.71,0.64,0.49, for n = 100,200,300,400,800. Nominal level is 95%. 

In Figure 1, we constructed the SCB for g over (0, 1) based on (5.9) by taking d n = (—2 log h) 1 / 2 , 
and compared it with three so-called point-wise confidence bands constructed by linking the end- 
points of the ACI (6.4), Wahba's Bayesian C.I. (4.3) and NCI (4.7) at each observed covariate, 
denoted as ACB, BCB1 and BCB2, respectively. Data were generated under the same setup as 
above. From Figure 1, it is observed that the coverage properties of all the confidence bands are 
reasonably good, and getting better as n grows. Meanwhile, all band areas clearly shrink to zero 
as n. We also note that the ACBs possess the smallest band area, while the SCBs have the largest 
one, which is not surprising by its definition. The more technical reason is due to the d n factor in 
the construction of SCB, which is of ^/log n-order; see Remark 5.1. 

In the end of Case (I), we considered testing Hq : g is linear at the 95% significance level by 
both our PLRT and the GLRT ([18]). By Lemma 6.1 and (6.2), some direct calculations reveal 
that tk = 1.3333 and u n = 0.4714(/icx 1 / 2 ) _1 in (5.15) when m = 2. In the simulations, we replaced 
o~ by (7 defined above. Data were generated under the same setup except that a more linear true 
function g(z) = 3.2sin(7rz) (than the previous g{z) = 12sin(-7rz)) was used for the purpose of power 
comparison. For the GLRT method, the Epanechnikov kernel function is used under the R function 
"glkernsQ". For PLRT method, GCV was used to select the smoothing parameter considering 
the slight difference between h* and h** . Table 3 compares the powers (proportions of rejections 
in 10, 000 replications) for four sample sizes. When n = 40 or larger, both test methods achieve 
almost 100% power. We also note that PLRT shows moderate advantage in smaller sample even 
though the chosen smoothing parameter (by GCV) is not optimal in terms of testing. An intuitive 
reason is that the smoothing spline estimate in PLRT uses the full data information; while the 
local polynomial estimate used in GLRT only uses local data information, which might not be 
sufficient when sample size is small. Of course, as n grows, such difference rapidly vanishes due to 
the increasing data information. 
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BCB1,n = 200 BCB2, n = 200 ACB, n = 200 SCB, n = 200 
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BCB1,n = 400 BCB2, n = 400 ACB, n = 400 SCB, n = 400 
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Fig 1. 95% point-wise and simultaneous confidence bands for periodic g in Case (I). The upper and lower bands 
are indicated by green curves, while the central black curve represents the true function. The numerical band area is 
denoted as "ba". 



Case (II). % = H m (I): For this more general T~L, we repeated most of the inference procedures in 
Case (I) by assuming the non-periodic true function g(z) = 6 sin(2.87T2:) and using the cubic spline 
for estimation. Hence, we only point out the differences. Figure 2 summarizes the simultaneous 
confidence band and point- wise confidence bands in which BCB1 was computed by (4.2) and BCB2 
was constructed by scaling the length of the BCB1 by a factor ^/27/32 « 0.919. We tested the 
linearity of g at significance level 95%, and assumed g{z) = 1.5 sin(2.87rz). Table 4 summarizes the 
powers of the PLRT and GLRT. From Figure 2 and Table 4, we conclude that all the observations 
and findings for the periodic spline in Case (I) remain the same for the non-periodic spline. 

Example 6.2. (Nonparametric Gamma Model) Consider the two-parameter exponential model 

Y\Z ~ Gamma(a, exp(go(Z))), 

where a > 0, go £ H^il) and Z is uniform over [0, 1]. This framework corresponds to £(y;g(z)) = 
ag(z) + (a — l)logy — yexp(g(z)). Thus, it can be shown that I(z) = a, leading us to choose 
the basis functions to be h u s defined as in (6.2) with a = a -1 / 2 , and the eigenvalues to be ^2k = 
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100 x Power% 
n = 20 n = 30 n = 40 n = 100 
PLRT 92/7 98T6 99/T 100 
GLRT 90.1 98.2 99.7 100 
Table 3 

Power comparison of the PLRT and GLRT for four sample sizes in Case (I). Significance level is 95%. 



100 x Power% 
n = 20 n = 30 n = 40 n = 100 
PLRT 92J5 99T 99/f 100 
GLRT 90.3 98.9 99.5 100 
Table 4 

Power comparison of the PLRT and GLRT for four sample sizes in Case (II). Significance level is 95%. 



72fc— l = a 1 (27r/c) 2m for k > 1, and 70 = 0. One can conduct the local and global inferences in the 
similar manner as Case (I) of Example 6.1. 

Example 6.3. (Nonparametric Logistic Regression) In this example, we consider the binary 
response Y G {0, 1} modeled by the following logistic model 

(6.5) P(Y = l\Z = z^- eM90{2)) 



1 + exp(g (z)) ' 

where go G H m (T). A straightforward calculation gives I(z) = (i+exp(go(^))) 2 • ^ n * ms exam pl e ! c o has 
no explicit form since the pair (hu^a) has no explicit form. Therefore, we have to find an accurate 
estimate of cq. To achieve this, we will use (2.11) to approximate h u s and 7„s. Thus, accurate 
estimates I(z) and vf(-z) are needed. Observe that I{z) = P(Y = 1\Z = z)P(Y = 0\Z = z). To 
approximate I(z), we thus have to plug in an estimate of P(Y = 1\Z = z). Note P(Y = 1\Z = 
z) = [P(Z = z\Y = 1)P(Y = 1)]/P(Z = z). Denote m{z) = P(Z = z\Y = 1), r = P(Y = 1) 
and tt(z) = P(Z = z). Let tt\ and n be consistent estimate of %\ and it, such as the kernel 
density estimators. Let r be the proportion of Y = 1, which is a consistent estimate of r. Then we 
can approximate I(z) by I(z) = 7r l|^ r ^1 — j ■ One may find the approximated eigensystem 

(h u , X u )s by solving the approximate version of (2.11) in which /(•) and tt(-) are replaced by /(•) 
and ??(•), respectively. Obviously, the approximated eigensystem are needed in the local and global 
inferences. For example, to perform PLRT test based on Theorem 5.3, we can use (h^, A M )s to specify 
the null limiting distribution and the theoretical 95% cutoff value in (5.15). Meanwhile, we are also 
aware that solving the approximated eigensystem could be computationally tricky. Fortunately, in 
the PLRT, it can be avoided by directly simulating the null limit distributions, e.g., by the wild 
bootstrap in [37], as long as the Wilk's type of results holds. 

Acknowledge: We appreciate helpful discussions with Professor Chong Gu. 

APPENDIX 

A.l. Proof of Proposition 2.1. Based on the definition (2.8), we can write \\g\\ 2 = V(g,g) + 
\J(g,g), and then plug in the Fourier expansion of g to obtain the explicit expression of ||g|| 2 . A 
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BCB1,n = 100 BCB2, n = 100 ACB, n = 1 00 SCB, n = 100 

ba = 1.092594 ba = 1.003611 ba = 0.9302145 ba = 2.4125 
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BCB1,n = 200 BCB2, n = 200 ACB, n = 200 SCB, n = 200 

ba = 0.840578 ba = 0.7721 202 ba = 0.7099861 ba= 1.582709 
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BCB1,n = 400 BCB2, n = 400 ACB, n = 400 SCB, n = 400 

ba = 0.6030603 ba = 0.5539463 ba = 0.5083777 ba= 1.199456 
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Fig 2. 95% point-wise and simultaneous confidence bands for non-periodic g in Case (II). The upper and lower bands 
are indicated by green curves, while the central black curve represents the true function. The numerical band area is 
denoted as "ba". 



direct calculation reveals that 

(A.i) (g, K) = (J2 v{g, M V M = v (9, M(i + *y«), 

for any g G H m (I) and v G N. It follows by (A.I) that V(K Z , h v ) = {K z , h v )/(l + X-y v ) = h u (z)/(l + 
X-) v ). Hence, we can obtain the expression of K z (-) by considering K z {-) = V(K Z , h u )h u (-). 
Furthermore, (A.I) implies that V(W\h v , hp) = (W\h v ,h^) / {I + A7 M ) = X'y ^6^/(1 + Xj^), for 
any u,fj,G.N. In the end, we can conclude the proof of Proposition 2.1 by considering W\h v {-) = 

E^v(w x h u ,h^. 

A. 2. Proof of Proposition 2.2. The usual L2-inner product is defined to be (g, £)l 2 = g(z)£(z)dz. 
Let D be the differential operator, i.e., Dqt> = ^(j>, and u> = l/(Iir). Thus, oj G C m (I) is positive and 
finitely upper bounded. It follows from [6] that the growing rates for "y u is of order v 2m . Since the 
operator Lq = (—l) m u)D 2rn is self-adjoint under the inner product V, that is, V(Log,^) = V(g,Lo£) 
for any £, g G C 2m (I) satisfying the boundary conditions in (2.11), the orthogonality and complete- 
ness of h v s under V thus follow from Theorem 2.1 (pp. 189) and Theorem 4.2 (pp. 199) of [11] 
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with the usual L2-inner product (,)l 2 replaced with V. Therefore, when h u s are normalized to 
V(h u , h v ) = 1, they form an orthonormal and complete set in 1/2(11; V). 

Next we show that h„ , v > to, are complete in ^2(1) under (,)l 2 - The idea follows by arguments 
in page 147 of [35]. The eigenspace corresponding to zero eigenvalue contains functions 0s that 
satisfy (— l) m ^( 2m ) = with boundary conditions <p^(0) = (jP\l) = for j = m, . . . ,2m — 1, 
thus, it follows from [54] that this eigenspace is V m -i, the set of all polynomials of degree at most 
m — 1. Let h v , v = 0, . . . , m — 1, be the orthonormal basis (under V) of V m -\ corresponding 
to 70 = ... = 7m- 1 = 0. Note -y u > for v > m. If g £ Lz(J) such that for any v > m, 
Jq ghv = 0. Let £ be a solution of £ ( - m - > = g, then using integration by parts we have = 

Jo£h ( u m) = {-l) m lv V^,h u ). Therefore V&K) = for any v > m. By completeness of h u s, 
£ must be a linear combination of ho, . . . , h m -i, a polynomial with degree at most m — 1. So 
g = £( m ) = implying the completeness of hv /7z> ■ , v > to, in L2(I) under (,)l 2 - Now, for any 
~ g if m (][) ) by completeness of /i^s in -L2OD under F-norm, 5 = X^i/eN^G?' h u )h u with convergence 
in V-norm; since V(<7, /i^) = Jq 1 g^hv /"fv, by completeness of h^ /j u , v > to in £2(1) in usual 

|| • || i2 -norm, <? (m) = I]j,> m (<? (m) , hv)L a hv/j v = J2v> m V (.9,K)hv m ' 1 with convergence in usual 
L2-norm, implying g = V(g, h v )h v converges in || • ||. 

Next we show the uniform boundedness of h v . We only consider those h v s corresponding to 
nonzero 7^s. If 7^ / and h v satisfy (— l) m h^ m ^ = ^ v lTvh v and V{h v , h u ) = 1, then using boundary 
conditions in (2.11) and integration by parts one can check that J(h u , h u ) = 7„. On both sides, divid- 
ing I tv and taking m-order derivatives one obtains Lh„ = 7^^i with h u m+: '\o) = h^ n+: '\l) = 0, 
j = 0, ...,m — 1, where L = (— l) m X^j=o ( ri j)^^ D 2m ~K Therefore, hu is an eigenfunction 
of L with eigenvalue 7^. Denote the eigenfunctions and eigenvalues of L to be ij) v and \ v sub- 
ject to ipv\o) = ipv\l) = 0, j = 0, . . . , m — 1. We need to transform L to normal form. Let 
t(z) = ^[I{s)Tv{s)Y^ 2m )ds/C, C = ^{I{z)Tv{z)] l /^dz. Define <j) v (t(z)) = ^ v (z). Then by a 
direct examination, <p p satisfies the following differential equation 
(A.2) 

€ m) (t) + q2m-l(t)^ m - 1) (t) + ...+ q (t)Mt) = PuMt),4 j) (0) = #(1) = 0,j = 0, . . . ,m - 1 

where qjS, j = 0, . . . , 2m— 1, are coefficient functions depending only on Itv and to, and = X u C 2m . 
In general the forms of qjS are complicated though they can be determined by Faa di Bruno's 
formula ([28]). As an illustration, when m = 2, q (t) = 0, q 3 (t) = -(K/4)w( 1 )(z(t))w(z(t))- 3 / 4 , 
q 2 (t) = -{K 2 /4)(ujW(z{t))) 2 u;(z(t))- 3 / 2 , and 

qi (t) =K*(-bu{z{t))-*l\J 1 Xz{t))f/te+te{z^ 

where z(t) is the inverse function of t(z) and 62(2) = [I(z)tv(z)] 1 / 4: . Define 



1 



(A. 3) u v {t) = <j) u (t)exp f— y ?2m-i(s)ds I , 

then (A.2) is equivalent to 

(A.4) Luu = ^ 2m \t) + i, + p 2m . 2 (t)^ 2m - 2 ^ + ... +po(t)u u (t) = p u u u (t), 

with the boundary conditions Uu\o) = Uu\l) = 0, j = 0, . . . , m — 1. Note (A.4) is the classic 
form of differential systems discussed in [6]. According to [6], p u s are simple due to the regular 
boundary conditions, and the residue of the Green function G[z\,z%p) for L — pi at pole p v is 
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given by ""ftO^CftO ; w h ere II . II, denotes the usual Z/2-norm. On the other hand, the residue can 

II'MIlj 

also be represented by / r 2mC 2m ~ 1 G(ti, t 2 , ( 2m )d( (pp. 722, [50]), where C = p 1 '^, T Pv 

denotes the contour centered around pole p u with suitably small radius. By equation (56) and the 
discussions below in [6], 2mC, 2m ~ 1 G{t\,t2] Q 2m ) is uniformly bounded for t±, t2 G I, thus, the residue 
is uniformly abounded for all t\,t%. In particular, letting t\ = t<z = t, we get |itj,(i)| < c||u„[|i, 2 for 
any t E I with a universal constant c > 0. Since q_% m -\ achieves finite upper and lower bounds on 
I, by (A. 3), there is a universal constant c\ > such that for any i>, H^Hsup < ci||<^v||l 2 - Now use 
4> u (t(z)) = ip u (z) we get 



= ll^llsup < c\\\Ml 2 = A I' \Ut)\ 2 dt = c\ f \Mt(z))\ 2 \I(z)n(z)\^ 2m Uz < c 2 cl\\Ml 2 , 

Jo Jo 



(m) 



where ci n is a constant depending only on In and m. So ||Vv||sup < ciC/n-HVvllz^- Letting ip u = h u 
and using the fact that ||/il"^||| 2 = Jv, we have ||/t^ || S up < CxclizlV 2 ■> for any 

By Sobolev embedding theorem ([1]), ||^|| 2 up — c 2 (V(h u , h v ) + J{h u ,h v )) = c 2 (l + 7„). Using 

Theorem 5 of [54], for any j = 1, . . . ,m, there is constant Cj > such that \\hz . || S up < Cj"(l + 

7^) 1 ^ 2 ) Vi/ G N. Therefore, taking m-order derivative on both sides of (— l) m h^ m ^ = j u Inh u , one 
has for some constant C2 > 0, for any u, 
c 2 (l + 7,) 3/2 . 

Again, by Theorem 5 of [54] for and e = j u 1 '( 2m ) ) we have ||/i^ 2m ^ || sup < C' m (l +7i/), which 

implies ||/ij,|| S up < Cm( m fz |-^(-2 ; )|) _1 (l + "fu)/jv < C^, with a universal constant unrelated to 
1/. This proves the desired uniform boundedness of h v s. 

A. 3. Proof of Lemma 3.1. For any 2 £ I, \{K z ,g)\ < \\K Z \\ ■ \\g\\, so we only need to find the 
upper bound for By Proposition 2.1 and the boundedness of Ls, 

(A.5) H^ll 2 = K(z,z) = ^ ^P^f <CEt^T- * ^- 1/(2m) = &h-\ 

where c m > is a constant that does not rely on z and /i. So \\K Z \\ < c m h~ 1 / 2 . 
A. 4. Proof of Lemma 3.2. For any g,f € Q, by Lemma 3.1, 

■ n (r;/)-Vn(T; ff ))K z || < c-^^IIZ-^lls^-H^zll 

< c™ 1 ^ 1 / 2 !!/ - <?|| sup • ^/i- 1 / 2 = ||/ - 5 || sup . 



By Theorem 3.5 of [42], for any t > 0, P(\\Z n (f) - Z n (g)\\ > t) < 2exp {- m l g \ lL J - Then by 
Lemma 8.1 in [31], we have ||||Z n (g) — ^n(/)||||^, 2 < 8\\g — /|| SU p, where || • \\^ 2 denotes the Orlicz 
norm associated with ipz{ s ) = exp(s 2 ) — 1. It follows by Theorem 8.4 of [31] that for arbitrary 5 > 0, 



sup \\Z n (g) - Z n (f)\\ 
g,feS 

ll9-/l|su P <<5 



< c (J* ^/io g (i + n(s, g, 11 • || sup )) + ^io g (i + n(s, g, || • || sup )2) 



1p2 

_ r (2m-l)/(4m) 5 l-l/(2m)^ 
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So, again, by Lemma 8.1 in [31], 
/ 

(A.6) P 



\ 



sup ||Z n (5)|| > t 

\!l5l|su P <5 / 



< 2eM-h {2m ~ 1)/(2m) S- 2+1/m t 2 ). 



Let b n = n 1 /2^-(2m-i)/(4m) ) £ = h -i ^ 7 = i_i/( 2m ), T n = (5 log log ri) 1 ' 2 , and Q £ = [-loge-1], 
where [a] denotes the integer part of a. Then by (A.6), 



P sup 



\/n\\Z n 



geg On||ff||sup + 1 



> T n \ < P 



^ y/n\\Z n 
sup — 

. geg «n||y||sup 
\\\g\\^<e lh 



7 +1 



> T 



1=0 

( 



< P 



sup 

geg 

v(2 i £) 1 ^<||9|| SU p<(2 i + 1 £ ) 1 /7 
\ 



n\\Z r 



*niiy iisup 



+ 1 



> T n 



sup Vn\\Z n 
geg 

Vllsllsup^e 1 ^ 



> T 



\ 



1=0 



sup 

geg 

v||9l|sup<(2'+ 1 £ ) 1 ^ 



n\\Z n (g)\\ > (l + 2 l )T n 



J 



< 2 exp ^_/ i (2m-l)/(2m) {£ l/ y) -2+l/m T 2 /n J 



1=0 



+ 2 exp (-/jC 2 — D/(2m) [(2 *+l £ )l/7]-2+l/m T 2 (2 * + 1)2/n 



Qe 



= 2 exp (-T 2 ) + ^2exp [-2~ 2 ^T 2 (2 l + l) 5 

1=0 

< 2{Q £ + 2) exp (-T 2 /4) < const • log re(log n)~ 5/4 0, 
as n — > oo. This proves the result. 

A. 5. Proof of Theorem 3.4- By Assumption A.l (a), it is not difficult to check the following 



(A.7) 



max sup \£ a (Yi;a)\ = P (log ri). 

l<i<n af z X 



By (A.7) we can let C > Co be sufficiently large so that the event B n \ = {maxi<j< n sup ag j \^a(Yi', a) | < 
Clogn} has large probability. 

Denote g = g Hi \ — go. By Assumption A. 3, the event B n 2 = {\\g\\ < r n = M((nh)~ 1 / 2 + h m )} has 
large probability with some preselected large M, so B n = B n \ n B n 2 has large probability. Define 
g = d~ x g, where d n = c m r n h~ l l 2 . Since h = o(l) and nh 2 —> oo, d n = o(l). Then by Lemma 3.1, 



-2\-l| 



on B n , \\g\\ sup < 1. Note that J(g,g) = d~ z A" 1 (AJ {g , g)) < d~ 2 X 
Thus, when event B n holds, g is an element in Q. 



< d~ 2 \- x r 2 n < c~ 2 h\ 
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Define ip(T;g) = £ a (Y;g(Z) + go(Z)) — l a (Y; go(Z)). By the definition of S n , and a direct cal- 
culation, one can verify that S n (g + go) - S(g + go) - (S n (g ) - S(go)) = ^ Ya=i bP( T i'i 9) K z i - 
E{i>(T-g)K z }}. 

Let iJn(T;g) = C^c^ilogn^h^d-^^dng) and Vn(^;sO = M^g)^, where A { = 
{sup ag j \l a (Yi\ a) | < Clogra} for i = 1, . . . , n. Observe that B n implies n»^. 

Next we show that ip n satisfies (3.2). For any g±, g2 G G, and z G I, since go(z) G Xo and 
d n = o(l), both go(^) + d n gi(z) and 5o( 2 ) + d n g 2 (z) fall in X when n is sufficiently large (recall that 
Xq and X are specified in Assumption A.l). Therefore, 

\MT i ;d n g 1 )-ip n (T i ;d n g 2 )\ = C-^QognyW^WPi-, g x ) - ^(T f , g 2 )\ ■ I Ai 

rgo(Zi)+d n gi(Zi) 

= C- 1 C" 1 {log nyWd- 1 ] / E a {Yi-,a)-I Ai da 

JgoiZi) 

rgo(Zi)+d n gi(Zi) 

- / L(Yi;a) ■ I Ai da\ 

Jgo{Zi) 

sup 

• sup \ £ a (Yi; a) | • Xl; 

< C^c" 1 (log n) -1 /! 1 / 2 ^ 1 • d n • Clogra • \\gx - g 2 \\ S u P 
= c^ l 1 h 1/2 \\g 1 -g 2 1 1 sup- 
Thus, ip n satisfies (3.2). By Lemma 3.2, with large probability 

n 

(A.8) || Y J ^n{T l -~9)K Zl - E{ip n (T;g)Kz}}\\ < („V2 /t -(2m- 1 )/(4m) + i)( 5 i oglogn) i/2. 

1=1 

On the other hand, by Chebyshev's inequality 

P{At) = exp(-(C7/Co)logn)^{exp(sup|4(^;a)|/C )} < C in - C / C °. 

Since h = o(l) and nh? —> oo, we may choose C to be large so that 2 1 / 2 C _1 Q)Ci(logra)~ 1 ra~ c '/( 2C ' ) < 
a' n h l l 2 d^, where a' n = n- 1 / 2 ((n/i)~ 1 / 2 + / l ™)/j-(e™-i)/(4m) ( log log n )i/2_ By ( 2 .3) ; which implies 
£'{sup agX \i a {Yi] a)\\Zi} < 2CiCq, we have, on B n , E{\ip(T; d n g)\ 2 } < 2C\Cld 2 n , where expectation 
is taken with respect to T = (Y, Z). So when n is large, on B n , by Chebyshev's inequality 

\\E{M^9)K Zi }-E{MT i; g)K z .}\\ = \\E{MTi;g)K Zi ■ I A c}\\ 

< C-Hlogn)" 1 ^ 1 {E{\^(T i; d n ~g)\ 2 }) l/2 P{Aifl 2 

< 2 l / 2 C- l CoC l {\ogn)- l n- c / ( ~ 2C ^ 

< a'h^d' 1 , 
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where the expectation is taken with respect to Tj. Therefore, by (A. 8) and on B n , 
\\S n (g + go) - S(g + go) - (S n (g ) - S(g ))\\ 

= CCm(logra)/ *~ 1/2 S tl^(Tf,g)K Zi - E{MT-~g)K z }]\\ 

< CCm(log ^" 1/2rfw (ll it[MTi;g)K Zi - E{MT;g)Kz}]\\ 

i=l 

+n\\E{ii n {T i] g)K Zi } - E{MT V ,g)K Zi }\\) 

< Cc m{\ogn)h-^d n [(nl/2/i _ (2m _ 1)/(4m) + 1)(51oglogn) l/2 + naf^d' 1 ] 

(A.9) < C'cma! n log n, 

for some constant C > that only depends on C, c m , M. 

By Taylor's expansion, by the fact S n \{g + go) = 0, and by Proposition 2.3, 

\\S n (g + ga) ~ S(g + g ) - (S n (g ) - S(g ))\\ 
= \\S n ,x(g + go) - Sx(g + go) - S nj \(g ) + 5a (30) || 
= \\S\(g + go) + S n> x(go) - S x {go)\\ 

= \\DSx{go)g+ / sD z S\ (go + ss'g)ggdsds' + S n ,\(ga)\\ 
Jo Jo 

= \\~g+ / sD 2 S\(g + ss'g)ggdsds' + S ni \(g )\\ 
Jo Jo 

> || - g + S n ,\(go)\\ - || / / sD 2 Sx{go + ss'g)ggdsds'\\. 

Jo Jo 



Therefore, 



\g - S n \(go) 



< \\S n (g + go) -S(g + g ) - (S n (g ) - S(g ))\\ + || / / sD 2 S x (go + ss'g)ggdsds'\\ 

Jo Jo 

(A.10) < \\S n (g + go)-S(g + g )-(S n (go)-S(g ))\\+ [ f s\\D 2 S x (go + ss' g)gg\\dsds' . 

Jo Jo 

Next we find an upper bound for \\D 2 S x (go + ss' g) gg\\ . The Frechet derivative of DS X is found to 
be D 2 S X = D 2 S, therefore, D 2 S x (g +ss'g)gg = D 2 S(g +ss'g)gg = E{C'(Y; (g +ss> g)(Z))g(Z) 2 K z }, 
where expectation is taken with respect to T. Hence, by (2.4), on B n , 

\\D 2 Sx(go + ss'g)gg\\ = \\E{£"'(Y; (g + ss g)(Z))g(Z) 2 K z }\\ < E{E{sup \C(Y; a)\\Z}g(Z) 2 \\K z \\ 



(A.ll) < Cecmh- 1 / 2 ] 



g\\ 



where C t = sup zeI £^{sup agX \£'"(Y; a)\\Z = z}. Thus, from (A.9), (A.10) and (A.ll), with large 
probability, \\g — S n \(go)\\ < C'c m a' n log n + Cic m h~ x l 2 {((nh)~ x l 2 + h m ) 2 . This completes the proof 
of Theorem 3.4. 
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A. 6. Proof of Theorem 3.5. Define Rem n = g n ^\ — g$ — ^ YHi=i e %Kzi- By Theorem 3.4, Rem n 
satisfies ||i?em n || = Op(a n log n). By assumption a n \ogn = ©(n^ 1 / 2 ), ||i?em n || = op(n~ 1 / 2 ). Since 
E{\\ ELi^ll 2 } = nE{e 2 \\K z \\ 2 } = 0(nh- 1 ), I^Eti^ll = P {{nh)- l l 2 ). Thus, Rem n is 
ignorable compared with Y27=i t %E\ Zi . 

Next we show the limiting distribution of (?T./i) 1 ^ 2 (5n,A( 2; o) — go( z o))- Note that this is equal to 
(n/i) 1 / 2 ^,,,^ -go). Using the fact 

1 n 

\(nh) 1 / 2 (K Z0 ,g ntX -g*--Y,tiK Zi )\ < {nhf^K-^ \\ ■ \\Rem n \\ 

= P ({nh) l/2 h- l / 2 a n log n) = o P (l), 



n 

i=l 



we just need to find the limiting distribution of (nh) l l 2 (K Z0 , A Y17=i € i^-Zi) = {nh x ) 1 ^ 2 Y17=i e i^Zi (zo)- 
By Assumption A.l (c), i.e., E{e 2 \Z} = I(Z), we have 

n 

Var^uKzA^)) = nE{e 2 \K z (z )\ 2 } = nE{E{e 2 \Z}\K z (z )\ 2 } = nE{I(Z)\K z (z )\ 2 } = nV(K Zo ,K Zo 
i=l 

By assumption, as h -> 0, ^(^ ,^ ) cr 2 . Thus, (n/i" 1 )- 1 / 2 £f =1 e^Z^o) N(0,°%>) by 
CLT. The expression of <r 2 , i.e., (3.8), follows from Proposition 2.1. This completes the proof. 

A. 7. Proof of Theorem 4-4- F° r notational convenience, denote g = g nX , 9° = 9^ \i 9 = 
wq + g° — g. By Assumptions A. 3 and A. 4, with large probability, \\g\\ = Op(r n ), where r n = 
M((n/i) -1 / 2 + h m ) for some large M. By Assumption A.l (a), for some large constant C > 0, the 
event B n \ n -B n 2 has large probability, where -B n i = {maxi<j< ra sup ag j \i a (Yi] a)\ < Clogra} and 
B n 2 = {maxi<j< n sup ag j \£'"(Yi; a)| < Clogn}. Let a n be defined as in (3.5). 

By Taylor expansion, 

LRT n , x = £ n ,x(wo + g°)-e niX (g) 



S n ,\(g)g+ / / sDS nt x(g + ss'g)ggdsds' 
Jo Jo 



1 fl 



J 
1 r l 



sDS nt x(g + ss'g)ggdsds' 



s{DS U: x(g + ss'g)gg - DS n ,x{go)gg}dsds' 

io Jo 

(A.12) +\(DS n ,\(g Q )gg - E{DS n , x (go)gg}) + ^E{DS n ,x(g )gg}, 

denote the above three sums by I\, I2 and I3. Next we will study the asymptotic behavior of these 
sums. Denote g = g + ss'g — go, for any < s, s' < 1. So ||g|| = Op(r n ). 
We first study Ji. By calculations of the Frechet derivatives, we have 

1 " 

DS n ,x(g + ss'g)gg = DS n , x (g + g )gg = - V W;<?o(^) +g{Z i ))g(Z i ) 2 - (W x g,g)/2, 

i=l 
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and DS n , x (g )gg = ± E7=Ja(Yi; g (Zi))g(Zi) 2 - (W x g,g)/2. On B nl n S n2 , 
\DS n: x(g + ss'g)gg - DS n ,\{g )gg\ 

2 



(A.13) 



< -CaognJUgllsupVffCZO 1 
i=l 

1 ™ 

= C{\ogn)\\gU P (-Y,9{Zi)K Zi ,g) 
i=l 
1 n 

= C(logn)||5|| S u P <~5>(^)^ -E{g(Z)K z }, g) + C(logn)||5||sup^{5(^) 2 }, 



i=l 



where the expectations are taken with respect to Z. Now we look at - 1| ^I=i 9(Zi)K Zi —E{g(Z)K z }\\. 
Let d n = CmhT^rn and g = d~ l g. Consider ip(T;g) = g(Z) and ij} n (T;g) = c^h^^d' 1 ^^; d n g) 
(which satisfies (3.2)). Then by Lemma 3.2, 



??■ 



J2[9(Zi)K Zi -E{g(Z)K z }} 



i=l 



■n 



Yj^n{T i; g)K Zi - E{i> n (T;g)K z }) 



(A.14) 



where o{, = n~ 1 / 2 ((n/t)- 1 / 2 + ft™)/^ 6 ™- 1 ^ 4 ™) (log logn) 1 / 2 . Obviously, £{#(Z) 2 } = 0(||y-|| 2 ) 
Op(r 2 ). So by a' n = o(r„), 



\DS n ,\{g + ss'g)gg - DS ntX (g )gg\ 



(A.15) 



1 1 y 1 1 sup 

o, n r n log n) + Op(r 2 log n)) 
P (r^- x / 2 logn). 



Thus, = P {rlh- l l 2 \ogn). 

Next we study /2- By an argument similar to (A. 9), it can be shown that 

1 - 

(A.16) -|| Vitr^ot^))^,)^ -E{£ a (Y f , g (Z))g(Z)K z } \\ = P (a' n logn). 

n ^ — ^ 



8=1 



Thus, |/ 2 | = Op (a' n r n logn). 

Note I3 = — ||g , || 2 /2. Therefore, combining the above approximations of l\ and I2, we have — 2n • 
LRT nX = n\\w +g° -g\\ 2 + Op(nr n a' n logn + nr 3 n h- 1 / 2 logn) = n||w +^-?|| 2 + Op(nr n a n logn + 
nr^/1 -1 / 2 logn). By r 2 /i -1 / 2 = o(a n ) and nr n a n = o((log ^) _1 ), it is easy to see that Op(nr n a n logn+ 
nr^/i -1 / 2 logn) = op(l). Thus, part (ii) holds. So, to find the limiting distribution of the LRT test, 
we only focus on n\\wo + d° — <?|| 2 - By Theorems 3.4 and 4.3, 

(A.17) n^K + g -9- S°,x($) + SnAgo)\\ = P (n l l 2 a n \ogn) = o P (l), 

so we just have to focus on n l / 2 {S^ A (g{j) — Sn,\(9o)}- Recall that 

S U9°o) = l -jZ^K Zi -Wlgl 

8=1 
1 - 

- ei(K Zi - K Zi (z )K Z0 /K(z , z Q )) - W x go + (W x go)(z )K z jK(zo, zq), 



i=i 
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1 v^™ 



where ej = l a {Y%, 9o{Z%)) , and S n> \(g ) = - Yh=\ e i K z t ~ W\g Q . Thus, 
(A.18) 5° )A ( 5 g) - S n , x (g ) = (--J2^K Zi (z Q ) + (W A So)(^) J K Zo /K(z ,z Q ) 



n 



So H|S° A ( 5 0)-S njA (<?o)|| 2 = I^ELi^^oVV^o^o) - ^(Wa5o)(^o)/V^(^o^o)| 2 . By 
central limit theorem, (4.10) and y/n(W\go)(zo)/ ^ K(zq, zq) — > —c zo , we have 

1 n 

(A.19) —Y,^Kz^o)/VK(z ,z ) - ^(W x g )(z )/VK(zo,z ) A N(c Zo ,c ). 



n 

i=i 



It follows by (A.17)-(A.19) that —2n-LRT n \ — > cqx{( c z q I c o), the scaled non-central % distri- 
bution with degree of freedom one and noncentrality parameter c 2 Q /co, which shows (iii). It follows 
immediately that ||u>o + g° — g\\ = Op(n -1 / 2 ), i.e., part (i) holds. This completes the proof. 

A. 8. Proof of Theorem 5.1. By Theorem 3.4 and Lemma 3.1, 

1 n 

(A.20) ||? -r^ - ~J2 6iKz iW™P = Op{a n h- l ' 2 \ogn). 

n i=\ 

So the key is to study the leading process H n (z) = rT 1 ' 2 EiLi eiK Zi (z). 

Since -E{exp(|e|/Ci)|Z} < C2, a.s., we may fix a sufficiently large constant C > (1 + 35)C\ 
such that the event E n = {maxi<j< n |ej| < b n = Clogn} has large probability. Define H b (z) = 



n 



-1/2 



E?=i < b n )K Zi (z). Write H n (z) = H n {z) - H b (z) - E{H n (z) - H b (z)} + H b (z) 



E{H b (z)}. Obviously, on E n , H n {z) — H b {z) = 0. By Chebyshev's inequality and Lemma 3.1, we 
have 

\E{H n (z) - H h n {z)}\ = n l ' 2 \E{eI{\e\>b n )K z (z))}\ 

< 0{l)h- l / 2 n l ' 2 E{\e\-I{\e\>b n )} 

< OMh-WnWEUefyWpdel > bn) 1 ' 2 
= 0(^ 1 / 2 n 1 / 2 exp(-6 n /(2C 1 ))). 

Thus, 

(A.21) sup \H n {z) - H b n (z) - E{H n (z) - H b n (z)}\ = P {y x l 2 n x l 2 exp(-6 n /(2C7 1 ))). 

z&. 

Denote R n {z) = H b (z) - E{H b (z)}, then by (A.21), 

(A.22) sup\H n (z) -R n (z)\ = P (hr l l 2 n x l 2 exp(-6 n /(2Ci))). 

Let Z n (e, z) = n 1 / 2 (P n (e, z)-P{e, z)), where P n (e, z) and P(e, z) are empirical and population distri- 
bution of (e, Z). Then by Theorem 1 of [53], sup eg ^ zg j \Z n (e,z)-W(r(e,z))\ = P (n- 1 / 2 (logn) 2 ), 
where W is Brownian bridge indexed by [0, 1] x [0, 1], r(e, z) = (P z (z), P e \ z (e\z)), P z is the marginal 
distribution of Z, and P e \ z is the conditional distribution of e given Z . Write 

rl rbn rl 
R b n (z)= / / eK(z,t)dZ n (e,t) = / K(z,t)dV n (t), and 

JO J-b n JO 
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R° n (z) 



1 rb„ 



eK(z,t)dW(T(e 



t)) = f 
Jo 



K(z,t)dV°(t), 



where V n (t) = ed £ Z n (e,t) and V°(t) = ed e W(r(e,t)). By integration by parts 



V n {t) = Z n {e,t)e 



V°(t) = W(r(e,t))e 



So su PteI \V n (t) - V2(t)\ = Op(6 n n- 1 /2(i ogn )2 ) . 
By integration by parts again, we have 



/On 
Z n (e,t)de, and 
-b n 

/bn 
W(r(e,t))de. 
-b n 



R n (z) = V n (t)K(z,t) 



1 ,1 

- / Vn(t) 
t=0 JO 



K(z,t)dt, and 



i£(z) = *£$#(*,*) 



t=o Jo 



V n °(t)4^,t)^. 



Therefore, by assumption sup zt \ -^K(z, t)\ = 0(h 2 ), we have 



(A.23) 



sup 



\R n {z) - R° n (z)\ = P (h- 2 b n n-^ 2 (logn) 2 ). 



Write W{t\,t2) = B(t\,t2) — iii2-8(l,l), where B is standard Brownian motion indexed on 
[0, 1] x [0, 1]. Define R° n (z) = J* K(z,t)dU°(t), where U°(t) = j b _l n ed t B(r{e,t)). Direct calculations 

lead to R° n {z) - R° n (z) = B(l, 1) K(z,t) e dP(e,t). Therefore, by Lemma 3.1 and the finite 
exponential moment of |e|, 



sup \R° n (z) - R° n (z) 



|-B(1,1)| - sup | / K(z,t) 
z&L JO 



-bn 



dP e]z (e\t)dP z (t)\ 



|-B(1,1)| -sup | f K(z,t)E{eI(\e\ <b n )\Z = t}dP z {t)\ 
zei Jo 

|-B(1, 1)1 -sup | f K(z,t)E{eI(\e\ > b n )\Z = t}dP z (t)\ 
zei Jo 



(A.24) 



< c 2 m h- l \B(l,l)\E{\e\I(\e\>b n )} 
= P {h~ 1 exp{-b n /(2C 1 ))). 



Define R^{z) = Jq h 1 ui((z—t) /h)dU®(t) . Using integration by parts, we get U®(t) = B(r(e, t))e 



bn 



€ = -b n 



B(r(e,t))de, so we have sup te j |{7°(t)| = Op(b n ). Again, by integration by parts, Rn(z) 



R° n (z) = U%(t) [h-^dz - t)/h) - K{z : t)) 
by assumption (5.1), leads to 

(A.25) 



t=o 



Jo U%(t)i {h- l u>{{z - t)/h) - K(z, t)) dt, which, 



sup \R° n (z) - R° n (z)\ = O P (rt n exp(-C7 2 /i- 1+l 0). 

hv<z<l-hv 
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By proof of Lemma 3.7 in [22], the process R° n {z) is Gaussian with mean zero and has the 
same distribution as the process Y z (n) = h- 1 r 1 (/ n (t)) 1 /2 a; (( z _ t)/h)dW(t), where W is standard 
one-dimensional Brownian motion indexed on M., and I n (z) = E{e 2 I(\e\ < b n )\Z = z}. Define 
Y S = h- 1 J 1 (/(t)) 1 / 2 w((z -t)/h)dW(t). Obviously, sup z6l \I(z) - I n (z)\ = 0(exp(-6„/(2C 1 ))). 
It follows from the assumption (5.5) and i£{exp(|e|/Ci)|Z} < C, a.s., that 



d f d 

sup| — (I(z) - I n (z))\ = sup I / e 2 — ir(e\z)de\ 

z& az z& J\e\>b n az 

< sup/ e 2 pi(l + \e\ p2 )ir(e\z)de 

z& J\e\>b„. 



'\e\>K 
z£l J\e\>b n 

su P/ oi£{e 2 (l + \e\ p2 )I{\e\ > b n )\Z = z) = 0(exp(-6 n /(2Ci))). 



By (5.5) and trivial calculations, it can be shown that sup teI \-^I(t)\ < oo. Since when n is large, 
both / and I n are bounded below from zero, 



d 
dt 



1(f) 1 / 2 - I n {tf r ' 



(1/2) 



KtyiM^-utyiitf/^ 



< (V2) 



/(t)V2/ n(t )l/2 

\i(ty\ ■ m) 1 ' 2 - j n (t)V 2 | + i(t) l ' 2 \ i(ty - i n (ty\ 
i{tyi*i n (t)v* 

= 0(exp(-b n /(2C l ))), 
where for convenience we denote I'(t) to be the derivative of I(t). By integration by parts, 

h^wmm 1 / 2 - i n {t) i / 2 )u({ z - t)/h) 1 

i=0 

d 



V ( n ) _ v{n) 
I 0,z z z 



~h- 1 J W{t)- ({lit) 1 ' 2 - I n (tfl 2 M(z - t)/hj) dt 

h^wmm 1 / 2 - i n (t) i ' 2 M(z - t)/h) 1 

t=0 

-h- 1 [ W(t)± (lit) 1 / 2 - / n (i) 1/2 ) • - t)/h) 
W(t) (lit) 1 ' 2 - I„(i) 1/2 ) • u'((z - t)/h)dt, 



for which we have 
(A.26) 



+h 



sup |Y ( ? - YjT>\ = P (h~ 2 exp(-6 n /(2C 1 ))). 
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Next we define Y$ = h- 1 I(z) 1 / 2 ft uj((z - t)/h)dW(t). Then we have 

Y$-Y$ = ft" 1 [\l{t)V*-I(z)V*M(z-t)/h)dW(t) 



f (z-l)/h 

= \T X / {Jiz-shfl 2 - I{z) 1 ' 2 )uj{s)dW{z-sh) 

Jz/h 

(z-l)/h 

= bT x W{z - sh){I{z - sh) 1 ' 2 - I{z) l / 2 )u{s) 

s=z/h 

r(z—l)/h j . . 

-h' 1 / W(z - sh) — ( (I(z - sh) 1 ' 2 - I{z) 1 I 2 )lo{s)) ds. 

Jz/h ds V J 

Using the fact that \I(z — sh) 1 / 2 — I(z) l l 2 \ < Ci\s\h, for some positive constant Cj and any 
z, s € I, that \oj{s)\ < C u exp(— | s| / C3) which implies \u(z/h)\ < C w exp(— /i^ -1 /^) = 0(h) and 
l)/ft)| < C u exp(-/i* , - 1 /C 3 ) = O(ft) for W <z<l-W, and that u/ is bounded, it can be 
verified that 



(A.27) snp |yW-S$0| = O P (l). 



The last random process we will consider is Lf ] = hr x I{z) x l 2 f R u((z - t)/h)dW(t). We will 

establish the rate of convergence for swpf lV > <z< i_i l i P |Li n) - Y$\. For this purpose, we need the 
following result. 

Lemma A.l. For any k > 1/2, lim^oo P ^ su P s eR (i+|s|)« > d] = 0. 



Proof of Lemma A.l. Let D K = sup s>0 , we will only show limd->oo P(D K > d) = 0. 

The proof for sup s<0 is similar. Let Z, + = {0, 1, . . .} be the set of nonnegative integers. Note 

sup s>0 = sup mgZ+ sup m<s < TO+1 Ij^jl ■ Choose a constant /3 > such that (J3 + 1)(k— 1/2) > 

1. Then 

P(D K > d) = P f sup sup > d J 

\meZ+ m<s<m+l \ L + s ) I 



< 



£p( sup 



m=0 

00 

< ^P( sup > (l + m) K d 

vm<s<m+l 



m=0 

00 



< Vpf sup \W(s)\> (l + m) K d) 

m=0 \0<s<m + l J 



4 ™ eM-(d(l + m) K - 1/2 ) 2 /2) 
" (27r)V2 A/ d(l + m) K -V2 

.00 .. 

< 4 y - = ofd-( /3+1 h 

" (2VT) 1 / 2 ^ (d(l + m) K -V2)/3+i ^" 
where (A. 28) follows by [29]. Therefore, the desired result holds. □ 



ASYMPTOTIC INFERENCES FOR THE SMOOTHING SPLINE 



37 



Now define E n> i = |sup sgK < d\ f° r some fixed d > so that E n> \ has large probability. 

By integration by parts and a straightforward calculation, 

4»>->£> - H->«.rtr m,- () /™ + r^- t )/™ 

W(t)d((z-t)/h)hr l dt^ 
W{t)J{{z-t)/h)h- l dt] . 



h- l I{z) l/2 \ w(t)u((z-t)/h) 
+h- 1 I{z) 1 ' 2 (w(t)u((z-t)/h) 



t=—oo 
oo 



t=l ■/! 



oo 
oo 



On £i,n, for any z, \W(z)\ < d(l + \z\) K . By assumption (5.1), \u((z — t)/h))\ < C^exp^— \z 
t\/(hC 3 )), — < C w exp(— |z — i|/(/iC 3 )). Thus we have, for any fixed z, as \t\ — > oo 

\W{t)u{{z-t)/h)\ < dC w (l + |t|) K exp(-|^-t|/(^C 3 ))^0. 

Meanwhile, on E hn , |W(l)w((z - 1)//»)| < 2dexp(-/i l < 9 - 1 /C 3 ), and 

/OO /'OO 
W(i)c«/((z-t)/fc)dt| < y d(l + t) K -C w exp(-|z-t|/(C 3 /i))dt 

/oo 
d(l + ty ■ C u exp(-(i - z)/(C 3 h))dt 

POD 

= / d(l + i + z) K -C w exp(-t/(C 3 h))cft 

il-2 

I 

d(2 + t) K ■ C w exp(-t/(C 3 h))dt 

/oo 
d(2 + • C u exp(-t/(C 3 /i 1 -^))dt 

/>oo 

< W d(2 + t) K ■C UJ (t/(C 3 h 1 -^))- a dt 



< 



/oo 
(2 + t) K r a dt = Oih^ 1 



-ip)\ 



0(h 3 ), 



where a is constant with a > k + 2 and (/? + a(l — (p) > 3. Using similar technique, one can show 



that on .Ei,n> 
(A.29) 



/-oo W(t)uJ({z - t)/h)dt < 0(/iexp(-^-VC 3 )). Consequently, 



sup |L 

W<z<\—W 



(n) i>{n) 
1 r 



0.2 



O p (/i). 



Since h}^ 2 Lz I{z) 1 ^ 2 /a UJ = h l l 2 J uj((t — z)/h)dW(t)/a u} is stationary Gaussian with mean 
zero, the process /i 1//2 L^/(/iz) _1 / 2 /cr w is Gaussian with mean zero and covariance function uj(t)u 
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■)dt/a 2 . Then by [5], we have as n — > oo, 

P ( (251ogn) 1 / 2 J sup {h^L^Iizy^a-^-dn \ < u) 
\ [hf<z<l-h'P j J 

= P 1(25 log n) 1 ' 2 J sup \h 1 ' 2 L^I{z)- l / 2 a~ l \ - d n \ < u ) 

\ [o<z<l-2h^ j J 

= P ( (251ogn) 1 / 2 ( sup Ih^Lglikz)- 1 ' 2 *- 1 ] - d n ) < u) 

\ [o<z</i- 1 (l-2/i»') J J 

(A.30) 

— > exp(— exp(— 2u)), 

where a w = (j^uj(u) 2 du) 1 ^ 2 . By assumption C > (35 + l)Ci, m > (3 + \/5)/4 and < 5 < 
2m/ (8m - 1), the remainders in (A. 20), (A.22)-(A.29) are all op((h log n) _1/2 ). Thus the desired 
conclusion holds. 

A. 9. Proof of Theorem 5.3. For simplicity, denote g = g n ^\ and g = g — go- Using arguments 
similar to (A. 12), (A. 13) and (A. 16), and by assumption a n = o(r n ), nr n a n \ogn = o(/i -1 / 2 ), 
rar^/i" 1 / 2 logn = o(nr n a n log n) = o(/i -1 / 2 ), it can be shown that 

(A.31) -2n-PLRT ntX = n\\g- g \\ 2 + P (nr n a n logn + nr^/j- 1 ^ logn ) = n||?-flo|| 2 + op(/»~ 1/2 ). 

Under the hypothesis ffg loi>al that go is the "true" parameter, by Theorem 3.4, we have \\g — go — 
S n> x(go)\\ = Op(a n \ogn), where a n is defined as in (3.5). It thus follows from n l / 2 a n \ogn = o(l) 
that n^Wg-goW = n 1 / 2 \\S th x{g Q )\\+ o P (l). 

Next we study the leading term [|jS nj ^(<7o)||- We first approximate || WaS'o II • By Proposition 2.1 
and dominated convergence theorem, it can be established that 

(A.32) ||^ A50 || 2 = o (A). 

To see (A.32), define f\(v) = \V(go, h v )\ 2 7i> j^z i for z/ = 0, 1, . . . ,, A > 0. Then f\ is a sequence 
of functions satisfying \f\(v)\ < \V(g , h u )\ 2 j v = f(v). From g £ H m (I), X^ eN \V(go, K)\ 2 j u = 
Jn f( v )dm(u) < oo, where recall that N = {0, 1,2,.. .} and m(-) denotes the discrete measure over 
N. So / is an integrable function over N which dominates f\(v). Since lim A _-.o f\( u ) = 0, from 
Lebesgue dominated convergence theorem Yl v \ V(go, h u )\ 2 j^yr = J N f\(v)dm(v) — > 0. That is, 
II^A5o|| 2 = £ JV( 90 , K)\ 2 = o(A ). 

By (2.12), n||5 niA ( 5 o)|| 2 = ^^l E"=i <*K Zi f - 2 £™ = i e^go)^) + n|| TU A 5o|| 2 . It follows by 
the Fourier expansion of go and Proposition 2.1 that 



EUp^WxgoXZ^j 

= n£{e 2 |(^ A50 )(Z)| 2 } =nV(W x go,W x go) = n^\V(g , h u )\ 2 (j^j^j = o(n\), 

where the last equality follows by Yl v \V(go, h u )\ 2 j u < 00 and dominated convergence theorem; see 
(A.32) for similar arguments examining HWxffoll- So Yl?=i e i(W\go)(Zi) = op((n\) 1 / 2 ) = op(h~ 1 / 2 ). 
Thus, n|TU A (7o|| 2 = o(n\). Consequently, n||S n , A ( 5o )|| 2 = n- 1 \\YJl=ie l Kz l \\ 2 +n\\W x go\\ 2 +o P (h~ 1 / 2 ) = 
n ~ 1 II Sr=i e i^Zi || 2 +o(nA)+op(/i -1 / 2 ). In what follows, we study the limiting property of n _1 || Y^l=i e i^Zi 
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Write n- 1 1| YJU ^z t \\ 2 = n' 1 J27=i ^ K ( z u Z^+n^Win), where W(n) = e i e j K{Z i , Zj). 
If we denote Wij = 2ei€jK(Zi, Zj), then we can rewrite W(n) as Ei<i<]<n so that W(n) is 
clean (see [15]). Next we will derive the limiting distribution for W(n). Let cr(n) 2 = V ar(W (n)), 
and Gi, Gu, Gjy be defined as 

G/ = 5>{Wg}, 
G W = ^ (i?{W i 2 J ^} + ^{^^} + ^{^^}), and 

i<j<fc 

i<j<k<l 

It follows by Proposition 3.2 of [15] that, to show a(n)~ 1 W(n) A N(0, 1), it is sufficient to show that 
Gi, Gu, Giv are of lower order than a(n) 4 . By assumption E{e \Z} < C, a.s., we have E{e 4 \Z} < 
C < CC 2 I{Z), a.s. It then follows from (A.5) that E{W(-} = 16E{e 4 e 4 K(Zi, Zj) 4 } = 0(/i~ 4 ), 
implying Gi = 0(n 2 h- 4 ). Obviously, E{W?jW? k } < E{W%} = 0{h~ 4 ), implying G u = 0(n 3 /T 4 ). 
To approximate Giy, for pairwise different k,l, from direct examinations, we have 

EiWijWikWijWik} = 16E{4^4K(Z i} Zj)K{Zi, Z k )K{Z h Zj)K{Z h Z k )} 

- E (irk? " «»-'>• 

Therefore, G IV = 0(n 4 /i _1 ). 

Next we obtain the exact order of a{n) 4 which is n 4 h~ 2 . This follows from the observation 

E{W?j] = 4,E{e 2 e 2 K{Zi,Zj) 2 } = ^hr 1 p\. Thus, cx(n) 4 = has the same order as 

^n 4 h~ 2 p 4 K . It follows by /i = o(l) and (ra/i 2 ) -1 = o(l) that G/, G/j and G/y are of lower order 
than a{n) 4 , which implies by Proposition 3.2 of [15] that 

(A.33) -=J= Win) A N(0, 1). 

\ An L npx 

To conclude, we approximate the term Ya=i e l-^i z i-> Zi). By E{e 4 \Z} < C, a.s., we have E{e 4 K(Z, Z) 2 } = 
0{h- 2 ). Therefore, a direct calculation leads to E{\ Yn=M K { z ii Zi)-h~ x a 2 K \\ 2 } < nE{e 4 K(Z, Z) 2 } = 
0{nh- 2 ), where recall that a 2 K = hE{e 2 K(Z, Z)}. This implies Y2=i[ f % K ( z i> z i) ~ h ~ l °V\ = 
P (n 1 / 2 / l - 1 ). Therefore, 

n 

(A.34) n- 1 4K(Zi, Z-) = hr x a 2 K + P {nr l l 2 hr l ) = hr x a 2 K + O p {\). 

i=l 

From (A.33) and (A.34), (h/n)\\ Y^ =1 e i K Zi \\ 2 = <? 2 K + o P (l), implying n||S n , A (<?o)|| 2 = Op^ 1 + 
n\ + /i- 1 / 2 ) = P (h- v ), and hence n 1 /^ 5^(31)) II = P {h- 1 / 2 ). Thus, 

-2n-PLRT n>x = n \\g - g \\ 2 + op^ 1 / 2 ) 

= (n 1/2 ||5 n , A ( 50 )||+op(l)) 2 + op(^ 1 / 2 ) 

= n||S n , A ((7o)|| 2 + 2n 1 / 2 ||5 n , A ( 50 )|| • o P (l) + op^ 1 / 2 ) 

n 

(A.35) = n^W^iKz^ 2 + n\\W x g Q \\ 2 + opih- 1 ' 2 ). 
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It follows by (A.33)-(A.35) and Slutsky's theorem that (2h~ 1 aj < -/ ' p\y x l 2 {-2nr K ■ PLRT n \ - 
nr K \\W x g \\ 2 - hr l a%/p 2 K ) A N(0, 1). 

A. 10. Proof of Lemma 6.1. We need the following two inequalities in establishing (6.3): 

°° y.27rftt(fc+l) 1 oo 



/■oo i ^ rZirll' (K+l) | ^ 



, o — ; :Q (1 + (2^)- r 

and by a similar argument, J °° (1+ J 2 m)i ^ > Efcli (i+^fetl)^)' ■ This completes the proof. 
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In this document, we give the proofs of several results that were not included in Appendix. 
We also give a minimax rate result of PLRT testing in the more general modeling framework. 
The reference labels of the equations, Theorems, Propositions and Lemmas in this document are 
consistent with those in the main text of the paper. 

We organize this document as follows. In Section S.l, we prove Proposition 3.3, i.e., the rates of 
convergence o£g n ^\. In Section S.2, we prove Corollary 3.7 on the pointwise asymptotic normality of 
g rh x in a special setting. In Section S.3, we sketch the proof of another technical tool, the restricted 
FBR, which is used to establish the asymptotic null distribution of the local LRT test. In Section 
S.4, we prove Corollary 4.5. In Section S.5, we prove Theorem 4.6, that is, our local LRT attains 
minimum separation rates in a general framework. In Section S.6, we prove Proposition 5.2, i.e., 
the equivalent kernel conditions in cubic spline. In Section S.7, we prove Theorem 5.4, i.e., when 
data are normal, the PLRT attains minimax rates of testing. We further extend this result to a 
more general modeling framework in Section S.8. 

S.l. Proof of Proposition 3.3. To prove Proposition 3.3, we first need the following Lemma. 
Denote N(5,Q, \\ 

■ 1 1 sup) as the (5-covering number of the function class Q in terms of the uniform 

norm. 

Lemma S.l. Suppose that cT 2 h\- 1 > 1. Then for any 5 > 0, logN(5,g, \\ ■ || sup ) < (^(/iA^ 1 ) 1 /^™)^ 1 /" 1 , 
where C > is an universal constant. 

Proof of Lemma S.l. Note that by c~ 2 /iA _1 > 1, 

g = {c-Jhx- 1 ) 1 ' 2 ■ { g e tf m (i)||| 9 || sup < (c- 2 h\- l r 1/2 , J(g,g) < i} c {c^hx- l f' 2 r, 

where T= {g € H m (l)\ \\g\\ snp < l,J(g,g) < 1}. So by [31], 

log N(5,G, || • || sup ) < logiV^^/iA- 1 ) 1 / 2 ^ || • ||sup) 
= logA^c^/iA^r^TJ-llsup) 
< c((c- 2 / l A- 1 )- 1 / 2 5)- 1 / m = cc^ m (hX~ 1 ) V( 2m ) £~ V™ . 

□ 



Consider the function class T = {g(z) £ H m (T)\\\g\\ sup < 1, J(g,g) < 1}. By Lemma S.l, for any 
5 > 0, log N(5, T, || • || SU p) < ct 1 /" 1 , where c is some universal constant. Then a modification of 
Lemma 3.2 leads to 
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Lemma S.2. Suppose that ip n satisfies Lipschitz continuity, namely, 

(S.l) \MT;f)-MT;g)\ < c^h^Wf - g\\ sup , forallf, g eT, 

where c m is specified in Lemma 3.1. Then we have 

( 



lim P 

n— ¥00 



Vll 



5 

l|sup<l / 



II.'/ 

where the empirical process Z n (f) is defined in (3.1). 

Denote g = g U: \ — go. By consistency of g n ^\ in || • ||%-norm and Sobolev embedding Theorem 
(see [1]), we know that g n ,\(z) falls in I for any z Gl and large enough n. By Taylor's expansion, 

4,a(#o + g) - £ n ,x(go} = S n ^(go)g + 7; DS n,\(go)gg + \D 2 S n ,\{g*)ggg > 0, 

where g* = go + t*g for some t* £ [0, 1]. Denote the three sums on the right side of the above equa- 
tion by Ii, h,h- Next we will study the rates for these terms. Denote A4 = {sup ag j \i a iXii «)| + 
sup ag j \£'a(Yi; a)\ < Clogn}. By (2.4), we may choose C to be large so that (liAi has large proba- 
bility, and P{Af) = 0(n' 2 ). Then on f\A, 



n 

|6/ 3 | < -^su P |Cm;a)Hs(^)l 3 

n i=\ aeX 
1 n 

< -|l5llsu P y]sup|C(^;a)l -g{Zi) 2 

1 n 
= -\\g\\supCy^^(Ti;g)Kz t ,g) 

i=l 

n 

-HsIM^C^s)^ ~ E{^(T;g)K z }],g) + \\g\\ snp E{^(T; g)g(Z)}, 



n 



where ip(Ti;g) = sup aGX \l"' (Yf, a)\g(Zi)I Ai . Let Tp n (Ti;g) = (Clogn) 1 c n ?h 1/2 ip(T i ; g), which sat- 
isfies (S.l). Thus, by Lemma S.2, for large n and with large probability, 

n 

|| YyMTi\g)K Zi - E{MT;g)K z }}\\ < (n^l^^VC^) + ^(Sloglogn) 1 ^. 

i=l 

So by Cauchy's inequality, 



(^(T l ;g)K Zi -E^(T;g)K z }],g) 



i=l 



< \\g\\ • (n^WgWl-^ + ^(Sloglogn) 1 / 2 . 



On the other hand, by Assumption A.l (a), 



E{^(T;g)g(Z)} < E{ S up\C'(Y;a)\g(Z) 2 } < 2C 2 C 1 \\g\ 
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By {n l / 2 h)- 1 (log log n ) m /(2m-i) ( log n )2m/(2m-i) _ (i) 5 w hi c h implies (n l / 2 h)- 1 (log log n) 1 / 2 log n = 
o(l), we have 



|6/ 3 | < i||5||sup • llfflKClogn)^/!- 1 / 2 ^ 2 ^!!^/^) + iXSloglogn) 1 / 2 + 2C 2 C 1 

= ciC'(n 1 / 2 / l )-Hloglogn)V2 (logn )|| 5 ||2 + 2C 2 Cl || 5 || sup .|| 5 ||2 
(S.2) = op(1H| 5 || 2 . 

To approximate I2, by Cauchy's inequality we have 



E{£ a (Y;g (Z))I A .g(Z) 2 } 



< E{\£ a (Y;g (Z))\ 2 I A cg(Z)^ 2 • P(^) 1 / 2 

< 0(1) • (log^ll^lls.pll^lln- 1 = IHPOIW- 1 ^) = o( i)|| 5 ||2. 



By changing ip and ^„ in the proof of (S.2) to ipiT^g) = £ a (Yi; g (Zi))gI Ai and ip n (Ti;g) = 
(Clogn)~ 1 c^h 1 ^ 2 i/j(Ti; g), and using an argument similar to the proof of (S.2), we have 

\[DS n , x (g ) - E{DS n , x (g )}]gg\ < C Cm / i - 1+1 /(^) n -i/2 (loglogn) i/2 (logn) || 5 ||2-i/( 2m ) 

+Cc m (n/i 1 / 2 )- 1 (loglogn) 1 / 2 (logn)|| 5 ||+ 0p (l)|| 5 || 2 . 

Thus, 

2/ 2 = -llfff + C^^-^/^n-^Ooglog^^aogn)^!! 2 - 1 /^™) 

(5.3) +Cc m (n/ l 1 / 2 )- 1 (loglogn) 1 / 2 (logn)|| 5 || + 0p (l)|| 5 || 2 . 

Note that E{\\ J27=i e i-Kzi\\ 2 } = 0{nh~ l ), by (A. 32) in the main paper, we have 

(5.4) ||5„,A(<7o)||=0 P ((n/ l )- 1 /2 + A i/2 ) . 

Combining (S.2), (S.3), and (S.4), and by (n/* 1 / 2 )" 1 (log log n) 1 / 2 (log n) = o((n/i)- 1 / 2 ), we have 
for some large C 

(l + o P (l))|| 5 || 2 < C'((nhy l l 2 + A 1//2 )||#|| + Cc m /i _1+1 ^ 4m ^n _1 / 2 (loglogn) 1 / 2 (logn)||5r|| 2 ~ 1// ( 2m \ 

Solving this inequality, and using (n 1 / 2 /i) _1 (loglogn) m /( 2m_1 )(logn) 2m /( 2m_1 ) = o(l), we get \\g\\ = 
Opdnh)- 1 / 2 + A 1 / 2 ). 

S.2. Proof of Corollary 3.7. By Proposition 2.2, Assumption A. 2 holds. We first show part (i). 
By £'"(y;a) = for any y and a, that is, in (3.5) Ci = 0, we obtain a n = n^^^nh) -1 / 2 + 

r j r (6m-l)/(4m)( loglogn jl/2 > Since h _ n -l/(4m+l) j wg haye h = Q ^ &nd n/j 2 _^ qo. By m > 

(3 + \/5)/4, it can be verified that a n logn = o{n~ l l 2 ). 

On the other hand, by expression of K in terms of h u s (see Proposition 2.1), as h — > 0, 

[\j? m \z)K(z ,z)dz- gj? m \zo)/ir(zo) = £ — ^— ^tf m) A, MM*b) - E y ™ } A- MM*>) 
(S.5) = -J^-^^Ftf^KM^C^-^O, 
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where the limit in (S.5) follows from s ^ JU \V{g^ m \h v )h v {zQ)\ < oo and dominated convergence 
theorem. Then, by (3.11) and integration by parts, it can be shown that 



(W x g )(z ) = (W X90 ,K Z0 ) = XJ(g ,K 



(S.6) = (-l) m h 2m I g^ m) (z)K(z ,z)dz=(-irh 2m (g^ m \z )/n(z ) + o(l)). 

Jo 

So, as n — > oo, {nh) 1 / 2 (W\go)(zo) — > {—l) m g ( Q 2m \zo) / t^{zq) . Therefore all the assumptions in The- 
orem 3.6 hold. Then (3.12) directly follows from (3.10). 

The proof of (3.13) is similar to that of (3.12). One only notes, by (S.6) and h x n~ d for 
4^T < d < g^j, (nh) l / 2 {W x g ){zv) = 0((n/ l ) 1 /2/ l 2m ) = o(1) . Then (3 . 13 ) follows from ( 3 . 10 ). 

The proof of part (ii) is similar in spirit to that of part (i). The only difference is that since go 
does not satisfy the boundary conditions, and by integration by parts, (S.6) should be replaced by 
the following 
(S.7) 



(w xgo )(z ) = h 2m j2(- i y~ 1 



3=1 



|— ^•(,))- 5 r- i) (,) 



+ (-l) m h 2m I g ( Q 2m \z)K(z ,z)dz. 



o 



The first sum, by (3.14), is o(h 2m ). The second sum, by (S.5), is {-l) m h 2m (g^ 2m) (z )/tt(z ) +o(l)). 
Thus, (W x g )(zo) = (-l) m h 2m g { Q 2m) (z )/ir(z ) + o{h 2m ). Note this is not true for z = or 1. Then 
the proof can be finished by similar arguments in the proof of part (i) . 

S.3. Proof of Theorem 4-3. The proof is similar to those in Theorem 3.4, so we only sketch 
the idea. Let g = A — g®. Assumption A. 4 guarantees that with large probability, \\g\\ < r n = 

M({nh)-^ 2 + h m ) for a proper large M. By a modification of the proof of Lemma 3.2, we have the 
following lemma. 

Lemma S.3. Suppose that ip n satisfies Lipschitz continuity, namely, there exists a constant 
C^p > such that 

(S.8) \ipn(T;gi)-ipn(T;g2)\<c^-h 1 ' 2 \\gi-g2\\ B u P , for all g Xy 92 € M , 

where recall that T = (Y, Z) denotes the full data variable. Then we have 
( 



lim P 

n— »oo 



sup \\Zl(9)\\ < ( 5 log log n) 1 ^ 

, 9 eGa n 1 /2/ t -(2m-l)/(4m)|| 5 ||l-V(2m) + 1 

\llsl|sup<l 



where Q Q = {g £ WollMlsup < W(<7,<?) < c^hX' 1 } and Z a n {g) = ^ =1 [^ n (Tf, g)K* z -E{^ n (T; g)K* z }}. 

By a reexamination of the proof of Theorem 3.4, we have, with large probability, g £ Qq and 
ip n satisfies Lipschitz continuity (S.8), where ip n (T;g) = C~ 1 c^(logn)~ 1 h 1 / 2 d~ 1 {£ a (Y; go(Z) + 
d n g(Z)) — £ a (Y ; go(Z))} , and d n = Cmr n h~ l l 2 . This leads to, with large probability, 

n 

(S.9) || YSMTi\g)K* Zi - E{MT;g)K* z }}\\ < (n^-^-D/^) + 1)(51oglogn) i/2. 

i=l 

The remainder of the proof follows by (A. 7), and by an argument similar to (A. 9) - (A. 11). 
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5.4. Proof of Corollary 4-5. By Fourier expansion of go and W\h u = , we have (W\go)(zo) = 
J2u ^G?0j ^) i+A7 ^( z o)- By the assumption that \V(go, h u )\ 2 ^ < oo, one obtains the bound 
\(W x go)(z )\ = O^h- 1 ) 1 / 2 ) = 0(h md -^ 2 ) by using Cauchy's inequality. Thus, by h X n -1 ^ 2 "*- 1 ) 
and d > 1 + l/(2m), (n/i) 1 / 2 (W / AS'o)(- z o) = Direct calculations verify h = o(l), n/i 2 — >• oo, 
a n = o((n/t)~ 1/2 + ^ an = ( n -l/2(i ogn )-i) ) ffln = ofa -1 ^) -1 / 2 + fc^J-^logn) -1 ), and 
«n 3> ((n/i) -1 / 2 + /j m ) 2 /i -1 / 2 . Thus, the desired result follows from Theorem 4.3. 

5.5. Proof of Theorem 4-6. It is easy to check that h x n~ d with l/(2m+l) < <i < 2m/(10m— 1) 
and m > l+\/3/2 satisfies all the rate conditions on a n and h stated in Theorem 4.3. Before formally 
giving the proof, we establish the contiguity between Pg n0 and P^. It can be shown that the log 
likelihood ratio 

(S.10) 

n n 

io g (p; n0 /p;j = n- 1 / 2 ^^*^ 

i=l i=l 

Thus, under g = g* and using (4.13), PgjPg, 4 exp(f), where £ ~ N(-t 2 /2,t 2 q ). Since 
P{exp(£)} = 1, by Theorem 3.10.2 of [55], Pg nQ is contiguous with respect to P". 

Next we prove the theorem. For notational convenience, denote g = g n ,\ and cf* = g^ x . Under g = 
<?*, since g*(zo) = wo, Hq : g{z$) = w$ automatically holds. It then follows from Assumptions A. 3 
and A. 4, and the proof of Theorem 4.3 that under g = g*, — 2n- LRT n ^\ = n\\wo + g° — g\\ 2 + op« (1). 
Applying Theorems 3.4 and 4.3, we have —2n-LRT n ^\ = n\\S n) \(g*)— S® A(ff*)|| 2 +°P™ under g = 
g*, where recall g° = g*- w . Also, under g = g*, ~Yh=i Ua(Yi', 9*{Zi))K ZQ {Zi) - E{£ a (Y; g*(Z))K ZQ (Z)} 
Opn ((n/i)~ 1/2 ). By contiguity between P" n0 and P™ , under g = g n0 , -2n ■ LRT n ^\ = n\\S ni \(g*) - 
Six(9°*)\\ 2 +opnJl), and I E?=i (4(^5 5*(^))^o(^) " ^{4(^5 ^(Z))J^(Z)}) = P nJ(nh)^/ 2 ). 

On the other hand, using hK(zo, Zq) x a 2 and a direct examination leads to 

1 n 

n||S„,, A ( 5 *)-S° A ( 5 °)|| 2 = -^^\-J2ia(Y i -,g*(Z i ))Kn(Z i )-(W x g.)(z )\ 2 

K {Zq, z ) n <?— ' 



i=l 



(S.H) x a- 2 |^— ^i^^;^^))^^) - (n^)V 2 (^)(zo) 12 

i=i 



Note under 5 = g n Q, by assumptions of the theorem and Taylor's expansion, 
1 n 

-Y,UYu9*{Zi))Kzo{Zi) 
n 

i=\ 
1 n 

i=l 

= Opn ((nh)- 1 / 2 ) + E{ f e a (Y;g n0 (Z) - s Vn f n (Z)) Vn f n (Z)K Zo (Z)ds} 
9n0 Jo 

= Op? ((nh)- 1 / 2 ) + E{[ [£ a (Y;g n0 (Z) - s Vn f n {Z)) - £ a (Y; g n o(Z))} Vn f n (Z)K Zo (Z)ds} - r, n V(f,K Zo ) 



Opn ({nh)- 1 ' 2 ) + ^~ 1 ||/r»||i a O(l) - Vnfn(zo) + Vn(W X f n )(z ) . 



(S.12) 
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Since nn 2 V (f n , fn) = 0(1) under g = g*, which also holds under g = g n Q by contiguity between 
Pl and P£, we have r^h^WUf^ = 0((n/ l )" 1 ) = O^nh)- 1 / 2 ). Since J(f n ,fn) < C a {n\r, 2 n )-\ 
by Fourier expansion and Cauchy's inequality, it can be shown that 

rin\(W X f n )(z )\ < ^ J(f n J n )(\h- l ) l l 2 0(l) = 0{(nh)- 1 ' 2 ). 

Therefore, l^M^g^ZftK^Zi) = OpnJinhy^-n-^fM.SmceZjVig^K)^ 2 < 

oo implying (W x g*)(z ) = 0(h m ) (see Remark 4.1), (nh) 1 / 2 (W\g*){z ) = O^nh) 1 ' 2 ^ 1 ). By as- 
sumption rj n > (n/i)" 1 / 2 + h m and |/ n (-zo)| — ?• oo as n — )• oo, and by (S.ll) and (S.12), the 
leading term in the approximation of — 2n • LRT n \ is ^"^{nhn^f^zo)^ which goes to infinity as 
n — > oo. Therefore, there exists some sufficiently large N such that for any n > N, under g = g n Q, 
—2n ■ LRT n ^\ > c a with probability (in terms of Pg n0 ) greater than 1 — 5, where c a is the a-cutoff 
associated with the limiting distribution described in Theorem 4.4. Balancing the lower bound of 
rj n one obtains when h = h* the minimum rate r] n = n - m /( 2m + 1 ) [ s achieved. 

To show n _m /( 2m + 1 ) i s the sharp lower bound for rj n , assume otherwise rj n <C n - m /( 2m + 1 ) . 
Let u) be a function defined over M. satisfying w(0) = 1, ui and u/ TO ) are square integrable. Since 
rj n < n -rn/(2m+l) and ( n/l )i/2 = ^ n m/(2m+l)) j we have Vn (nh)^ 2 = o(l). Choose c n such that, 

as n 4 oo, c n — > oo, c n r] n n 1 / 2 — > oo and Cnrj^nh) 1 ^ 2 = o(l). For instance, one can choose 
c n = max{(?7 n n 1 / 2 )" 1 ^ 1 / 4 , [^(n/i) 1 / 2 ]" 1 / 2 }. Define f n (z) = c n ui(c 2 n nn 2 n (z - z )) for z G I. By 
a direct calculation, J(f n J n ) = c 2 n {c 2 n nn 2 n ) 2m ft |u/ m ) <(c n nn 2 n (z - z ))\ 2 'dz = 0(c^{nn 2 n ) 2m - X ). 
Since Cn^nh) 1 / 2 = o(l), we have J(f n ,fn) = o{r]~ Am {nh)- 2m (nr/ 2 ) 2 ™- 1 ) = o^nAf? 2 )" 1 ). Clearly, 
fn(zo) = c n — >• oo. Since c n ?7 n n 1//2 — >• oo, we have under g = g*, 

nr] 2 V(f n ,f n ) = 



where recall that 7r is the density of Z and = — E?{i a (Y; g*(Z))\Z = z}. Therefore, f n satisfies 
(4.13). Following (S.10) and the arguments below, it can be shown that P™ t and P g ™ are con- 
tiguous. Then by the proofs of (S.ll) and (S.12), under g = g n $ = g* + rj n f n , —2n ■ LRT n \ = 
(hK(z ,z ))- l \Opn (1) + (nhW 2 ri n f n (zo) + 0((n/i 2m+1 ) 1/2 )| 2 + (!)• Since h >c n~ d with 

9nO Sn.0 

d e [l/(2m + l),2m/(10m - 1)], (n/i 2m+1 ) 1/2 = 0(1). Note when n -> oo, (nh) l / 2 r] n f n (z ) = 
CnVninh) 1 / 2 = o(l) eventually vanishes. So —2n-LRT n \ = Opn (1). This means, —2n-LRT n \ > c a 
with probability (in terms of -P^ ) bounded by 1 — <5o for some 5q £ (0, 1) unrelated to n. This 
proves the sharpness of the lower bound n 1 ^ 2( - 2m+1 ^ for rj n . 

S.6. Proof of Proposition 5.2. We first consider (5.2) and (5.3). (5.2) trivially holds. By bound- 
edness and absolute integrability of u), for any p £ (0,2], lim^-froo ^wl ^ — q, 

implying C p in (5.3) is actually zero. 

For general m, let h v s and 7„s be the normalized (with respect to the usual L2-norm) eigenfunc- 
tions and eigenvalues of the boundary value problem (— l) m hv = ^ v h v , (0) = hk - (1) = 0, 
j = m, m + 1, . . . , 2m — 1. Thus, it is easy to see that h v = ah u and 7„ = cr 2 ~f u satisfy (2.11) with 
ir(z)I(z) = cr~ 2 , implying that h v s and 7^s form an effective eigensystem in H m (T). Let = cr 2 A 



c n nVn I (z)n(z)\u(c n nri n (z - z ))\ 
Jo 

(Iir)(z + (c 2 n nn 2 n )t)\u;(t)\ 2 dt 
I(z )tt(z ) [ \oj(t)\ 2 dt, 
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and = a l / m h. Define K(s,t) = ^i+lt^^ • Then K is the reproducing kernel function associ- 
ated with the inner product (/, g) x = f(t)g{t)dt + At f^ m \t)g^ m \t)dt. Thus, K is the Green's 
function associated with the differential equation (2.1) in [41], with the penalty parameter therein 
replaced by AL 

Next we restrict m = 2. By Theorem 4.1 in [38], for j = 0, 1, 



(S.13) sup 

s.tel 



K{s,t)-K{s,t)) 



< C' K (tf)-( j+ Vexp(-sm(-K/(2m))/tf), 



where by equation (6) in [38], K satisfies for any s,t G I and j = 0,1, 
(S.14) 

d J ( ^, , 1 (s-€ 



dtn KM -hJ Uo \ m 



<C^(^)-(i+i)( exp (-|l- s |/(v / 2/it)) + exp(-| s |/(v / 2/ i t))), 



with C' k ,C'k both being positive constants. By (S.13) and (S.14), it is easy to see that for any 
s,t € I and j = 0, 1, 



(S.15) 



d? (~ 1 (s-t 



dP v v ' ' u v m 

< C^(/i t )- (i+1) (exp(-sin(7r/(2m))//i t ) + exp(-|l-s|/(v / 2/i t )) +exp(-|s|/(V2/i t ))), 



where C, Ck are positive constant. By Proposition 2.1, K (s, t) = J2 U = cr 2 K(s, t). There- 

fore, K(s,t)-h- l io((s-t)/h) = a 2 (K(s,t) - {h))- 1 ^^ -t)/h))). It can thus be shown that, by 

(5.15) , Condition (5.1) holds. 

S.7. Proof of Theorem 5.4- First of all, by direct calculations, one can verify by 2 m+i — d < - 

and m > that h >c n~ d satisfies the conditions in Theorem 5.3. 

Next we prove our theorem. We write 

(5.16) - 2n • PLRT n , x = -2n(£ n) x(go) - £ n ,x{9no)) - 2n(£ n; \(g n0 ) - £ n ,x(g n ,x))- 

The proof proceeds by two parts. We first note that —2n- PLRT' = — 2n(£ nj \(g n o) — £ n ,x(9n,x)) is ac- 
tually the PLRT test for testing H\ n against f{9 lobal . Under H\ n , —2n- PLRT' has the same asymp- 
totic distribution as in Theorem 5.3, but uniformly for all g n £ Q a . That is to say, (2u n ) _1 / 2 (— 2nrx- 
PLRT' — n\\W\g n Q\\ 2 — u n ) = Op(l) uniformly for g n S Q a , where u n = h~ l a^/ ' p 2 K with a\ and 
p\ given in (5.14). Second, we show that -2n(£ n ^{go) - £ n ,x(9no)) = n\\g n \\ 2 + P (n l l 2 \\g n \\ + 
™ 1/2 ||3n|| 2 + nA). Then {2u n )~ 1 l 2 (-2nr K ■ PLRT - u n ) > n{2u n )- 1 l 2 \\g n \\ 2 {l + OpirT^WgJ- 1 + 

„-l/2 + A ||^||-2 )) + (2 ^ ) -X/ 2 „|| WA ^ || 2 _ hO ^ (1) > ri (2^)- 1 /2||^ ri ||2 (1 _ h0jR ( ri -X/2||^||-l_ hri -l/2_ h 

•^llffnir 2 )) + Op(l), where Op(-) holds uniformly for g n0 G Q a . Let n _1 / 2 ||g n || _1 < 1/C, \\\g n \\~ 2 < 
1/C and ||# n || 2 > C{nh 1 / 2 ) for sufficiently large C, which implies that I ^"^^i^T""" I > c a with 

large probability, where c a is the cutoff value (based on iV(0,l)) for rejecting }{9 lobal a t level a. 
This means we have to assume ||#n|| 2 > C(\ + (n/i 1 / 2 ) -1 ) to achieve large power. 

Next we complete the above two parts. First, it can be established that the following "uniform" 
FBR holds, i.e., for any 5 6 (0, 1), there exist positive constants C and N such that 

(5.17) inf inf P gn0 (\\g n ,\ - g n o - S n \(g n o)\\ < Ca n ) >l-5, 
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where recall that a n is defined as in (3.5), The proof of (S.17) follows by a careful reexamination 
of Theorem 3.4. Specifically, one can choose C and M (to be unrelated to g n G Q a ) to be large so 
that the event B n \ n B n 2, defined in the proof of Theorem 3.4, has probability greater than 1 — |. 
Then by going through exactly the same proof, it can be shown that when n > N for some suitably 
selected N, for any g n G Q a , (A. 9) holds with probability greater than 1 — 5/2 (by properly tunning 
the probability), with the constant C therein only depending on C,M,c m . By going through the 
proofs of (A. 10) and (A. 11), it can be shown that for n > N and g n G Q a , with probability larger 
than 1 — 5, \\g n \ — g n o — S n> \(g n o)\\ < Ca n , where the constant C and N are unrelated to g n G Q a . 
Using (S.17) and by exactly the same proof of Theorem 5.3, it can be shown that —2n • PLRT' 
follows the same asymptotic normal distribution under H\n 

: g = g n o as in Theorem 5.3, uniformly 

for g n G Q a - 

Second, for notational simplicity, denote Ri = l{Yi'i 9o{Zi)) — 9no{Zi)) for i = 1, . . . , n. Then 

n 

E{\ " ^(^)]| 2 } < nE{R}} = nE{\e ign {Zi) + g n (Zi) 2 \ 2 } = 0(n\\g n \\ 2 + n\\g n \\ 4 ). 
i=l 

Therefore, uniformly over g n G G a , n(£ nt x(g )-£ n ,\(gno)-E{tn,\(go)-tn,\(gno)}) = P (n 1/2 \\g n \\ + 
On the other hand, E{DS n ^{g n o)g n g n } = -E{\g n (Z)\ 2 } - XJ(g n ,9n) = -llffnll 2 - Therefore, 



E{£ n ,\(9o) ~ 4,A(5n0)} = E{Sn,\(9n0)(-9n) + (l/^)DS n ^(g n o)9n9n} = XJ(9nO,9n) ~ ||ffn|| /2. 

Since | J(g n o,9n)\ < \J(9o,9n)\+J(9n,9n) < J(go,go) 1/2 C 1/2 +C,^egetthat2n(£ nt x(go)-t n ,\(gno)) = 
— n\\g n \\ 2 + Op(n\ + n 1 / 2 ^^ + re 1 / 2 !!^!! 2 ) uniformly for g n G Q a . This completes the proof. 

S.8. Minimax separation rates of PLRT test in general modeling framework. To the end of this 
supplement document, we remark that in a more general modeling framework PLRT achieves the 
optimal minimax rate of hypothesis testing specified in Ingster (1993). The proofs are similar to 
those of Theorem 5.4 but requires a deeper tachnical tool, i.e., the mapping principle which builds 
equivalence between the eigenvalues obtained under null and contiguous alternatives. We still write 
the local alternative as H\ n : g = g n Q, where g n o = go + g n , 9o G H m (T) and g n belongs to some 
alternative value set Q a . 

Theorem 6.2. Let m > (3 + \/5)/4 w 1.309, and h x n~ d for ^±+1 < d < Let 
Assumption A.l (a) hold for constants Co, C\, a compact interval lo and an open interval I with 
Tq C X. There is a constant C2 > such that I/C2 < —£ a (Y;a) < C2 holds for any a G X. The 
values of 2go belong to Xq. Consider the alternative value set 

Ga = {g£ H m (I)\2g(z) G X for any z el, \\g\\ sup < (, J(g,g) < M}, 

where ( = l/(2C C 1 C 2 ) and M is a positive constant. Suppose under H\ n : g — g n o for g n G Q a , 
Assumptions A.l (c) and A. 2 hold (with go therein replaced by g n o)> -^{ e nol-^} — C , a.s., for some 
constant C > 0, with e n0 = l a (Y; g n0 (Z)), and uniformly over g n0 G Q a , \\g n ,\ ~ 9no\\ = P (r n ) 
holds under H\ n : g = g n o- Then for any 5 G (0, 1), there exist positive constants C and N such 
that 

(S.18) inf inf P ( reject H° lobal \H ln is true) > 1 - 5, 

n>N g„eg a V U / 

llffn II >C'ri n 

where rj n > \Jh 2m + (n/1 1 / 2 ) -1 . The minimal lower bound ofr\ n , i.e., n _2m /( 4m + 1 ) j j s achieved when 
h = h** = n - 2 /( 4m+1 ). 
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PROOF of Theorem 6.2. First of all, by direct calculations, one can verify by 2 m+\ — d < 8m-i 
and m > that h X n~ d satisfies the conditions in Theorem 5.3. Throughout, we only consider 

9n0 = 90 + 9n for g n £Qa- 

Next we prove our theorem. We write 
(S.19) - 2n • PLRT n x = -2n(£ n> x(g ) - £ n ,x(9no)) - 2n(£ ni x(g n o) - l n ,\(9n,\))- 

The proof proceeds by two parts. We first note that — 2n ■ PLRT' = —2n(£ n> x(g n o) ~ £n,\(9n,\)) 
is actually the PLRT test for testing H\ n against }{9 lobal . Under H\ n , —2n ■ PLRT' has the same 
asymptotic distribution as described in Theorem 5.3, but uniformly for all g n £ Q a . That is to 
say, (2u n0 )~ 1/2 (-2n • PLRT' nX - n\\ W x g n o\\ 2 ~ h~ l a 2 KnQ ) = Op(l) uniformly for g n0 = g Q + g n 
with g n £ g a , where u n0 = h^a^/ p 2 Kn0 under g = g n0 and cr 2 Kn0 , p 2 Kn0 are given in (5.14) with 
eigenvalues therein derived under g = g n Q. Denote u n = h a^/ p 2 K under g = go with a 2 K , p\ 
given in (5.14). Let V 9n0 and V go be the V functionals defined as in Section 2.2 under g = g n Q and 
g = go respectively. Then for any f £ H m (I), by Assumption A.l (a) and (b) 

\V gn0 (fJ)-V go (fJ)\ = \E{[L(Y;g n o(Z))-£ a (Y;go(Z))}\f(Z)\ 2 }\ 

< E{ S n P \C(Y;a)\-\g n (z)\.\f(z)\ 2 } 

< CoClC 2 ||5n||s U pV ffn0 (/,/) = Co||5n||supVg n0 (/,/), 

where Co = C0C1C2 = 1/(2C) is a universal constant. Therefore, (1 — Co||5n||sup)^9„ (/, /) — 
V go (f,f) < (1 + Co ||5n||su P )Vg n0 (/,/). By mapping principle (see Theorem 6.1 in [59]), the eigen- 
values induced by the functional pairs (V gn0 , J) and (V go , J) are thus equivalent in the sense that 
(1 - Collin 1 1 sup h™° < lv < (1 + Co||5 , n||su P )7™° for any v £ N, where 7™ denotes the eigenvalue 
corresponding to V 9n0 and j u is the eigenvalue corresponding to V go . Therefore, uniformly for g n o, 



^ 2 ^ 2 \ ^ h\(^ u - 7" ) nfh-l/ 2 
°KnO ~ °K - (1 ,n0 )(1 + W ) ~ °(IWUp) - 0(h 



(1 + A7?°)(1 + A 7 ,) 

Secondly, we show that —2n{£ n \(go) — £ n ,\(gno)) > n C"||<?n|| 2 + Op(n 1 / 2 ||g n || +n\), where C is 
some positive constant unrelated to /. Then 

(2u n y 1/2 (-2nr K ■ PLRT - u n ) 
= r K (2u n y x l 2 (-2n ■ PLRT' n>x - n\\W x g n o\\ 2 ~ ^"^ko) + r K {2u n )-V 2 n\\Wxg n of 

-r K {2u n )- 1 ' 2 ■ 2n(£ n , x (g ) - £ n ,x(gno)) + r K (2u n )- 1 / 2 h- 1 {a 2 Kn0 - a\) 
> O p (1) + nC'r K (2u n )- 1 / 2 \\g n \\ 2 (l + Opirr^Wgnf 1 + X\\g n \\' 2 )) + Oih^WgJ), 



where Op(-) holds uniformly for g n £ Q a . Let n 1 l 2 \\g n \\ 1 < 1/C, \\\g n \\ 2 < 1/C, Ch x \\g n \\ < 
nh l / 2 \\g n \\ 2 , and ||sr n || 2 > C^n/i 1 / 2 )" 1 for sufficiently large C, which implies that I ^"^^i/T""" I > 

c a with large probability, where c a is the cutoff value (based on N(0, 1)) for rejecting Hq 060 at 
nominal level a . This means we have to assume ||<?n|| 2 > C(A + (n/i 1 / 2 ) -1 ) to achieve large power. 

Next we complete the above two parts. First, it can be established that the following "uniform" 
FBR holds, i.e., for any 5 £ (0, 1), there exist positive constants C and N such that 

(S.20) inf inf P g (\\g n ,x - g n o - S n ,x(9na)\\ < Ca n ) > 1 - 5, 

n>N g n ^g a \ J 
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where recall that a n is defined as in (3.5), The proof of (S.20) follows by a careful reexamination 
of Theorem 3.4. Specifically, one can choose C and M (to be unrelated to g n G Q a ) to be large so 
that the event B n \ n B n 2, defined in the proof of Theorem 3.4, has probability greater than 1 — |. 
Then by going through exactly the same proof, it can be shown that when n > N for some suitably 
selected N, for any g n G C/ a , (A. 9) holds with probability greater than 1 — 5/2 (by properly tunning 
the probability), with the constant C therein only depending on C,M,c m . By going through the 
proofs of (A. 10) and (A. 11), it can be shown that for n > N and g n G Q a , with probability larger 
than 1 — 5, \\g n \ — g n o — S n> \(g n o)\\ < Ca n , where the constant C and N are unrelated to g n G Q a . 
Using (S.20) and by exactly the same proof of Theorem 5.3, it can be shown that — 2n • PLRT' 
follows the same asymptotic normal distribution under H\n ■ 9 = 9n0 as in Theorem 5.3, uniformly 
for g n e Q a - 

For simplicity, denote Ri = £(Yf, go(Zi)) — £(Yf, g n o(Zi)) for i = 1, . . . , n. Then 

n 

(S.21) E{\YjiRi-E{Ri)]\ 2 }<nE{Rl} = nE{\-e i g n {Z i )+£^ 
i=i 

where 5^ ( z ) = 9o( z ) + t*9n{z) f° r t* G (0,1), implying g^ ( z ) e fo r an y z - Assumption 
A.l, we get that (S.21) is uniformly 0(n||g n || 2 ) over g n G Q a . Therefore, uniformly over g n G Q a , 
n(£ n ,xi.9o) ~ £n,\(9no) - E{£ njX (go) - £n,x{9no)}) = P (n 1 ^ 2 \\g n || ) . 

On the other hand, by sup ag j^ a (y; a) < 0, we can find C > (unrelated to g n G Q a ) such that 
E{DS n>x (g* n0 )g n g n } = E{£ a (Y; g* n0 (Z))\g n (Z)\ 2 } - XJ(g n ,g n ) < -C"|| 5n || 2 /2. Therefore, 

E{£ n ,\(go) - £n,\(9no)} = E{S n ,X (#no) (~5n) + (l/2)-D5'n,A(^o)5 f n5n} 

< AJ( 5n0 ,5n) - C"||(7 n || 2 /2 = 0(A) - C'\\g n \\ 2 /2, 

where the last equality holds by J{g n ,9n) < M and \J(g n o,9n)\ < \J(9o, 9n)\ + J{9n, 9n) < 
J {go, go) 1 ' 2 M 1 ' 2 + M. Consequently, 2n(£ n>x (g ) - £ n>x (9no)) < -nC'\\g n \\ 2 + P (n\ + n 1 / 2 ^). 
This completes the proof. 
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